Re: [VOTE] Release Apache Hadoop 3.2.0 - RC1

2019-01-10 Thread Kuhu Shukla
+1 (non-binding)

- built from source on Mac
- deployed on a pseudo distributed one node cluster
- ran example jobs like sleep and wordcount.

Thank you for all the work on this release.
Regards,
Kuhu

On Thu, Jan 10, 2019 at 10:32 AM Craig.Condit 
wrote:

> +1 (non-binding)
>
> - built from source on CentOS 7.5
> - deployed single node cluster
> - ran several yarn jobs
> - ran multiple docker jobs, including spark-on-docker
>
> On 1/8/19, 5:42 AM, "Sunil G"  wrote:
>
> Hi folks,
>
>
> Thanks to all of you who helped in this release [1] and for helping to
> vote
> for RC0. I have created second release candidate (RC1) for Apache
> Hadoop
> 3.2.0.
>
>
> Artifacts for this RC are available here:
>
> http://home.apache.org/~sunilg/hadoop-3.2.0-RC1/
>
>
> RC tag in git is release-3.2.0-RC1.
>
>
>
> The maven artifacts are available via repository.apache.org at
>
> https://repository.apache.org/content/repositories/orgapachehadoop-1178/
>
>
> This vote will run 7 days (5 weekdays), ending on 14th Jan at 11:59 pm
> PST.
>
>
>
> 3.2.0 contains 1092 [2] fixed JIRA issues since 3.1.0. Below feature
> additions
>
> are the highlights of this release.
>
> 1. Node Attributes Support in YARN
>
> 2. Hadoop Submarine project for running Deep Learning workloads on YARN
>
> 3. Support service upgrade via YARN Service API and CLI
>
> 4. HDFS Storage Policy Satisfier
>
> 5. Support Windows Azure Storage - Blob file system in Hadoop
>
> 6. Phase 3 improvements for S3Guard and Phase 5 improvements S3a
>
> 7. Improvements in Router-based HDFS federation
>
>
>
> Thanks to Wangda, Vinod, Marton for helping me in preparing the
> release.
>
> I have done few testing with my pseudo cluster. My +1 to start.
>
>
>
> Regards,
>
> Sunil
>
>
>
> [1]
>
>
> https://lists.apache.org/thread.html/68c1745dcb65602aecce6f7e6b7f0af3d974b1bf0048e7823e58b06f@%3Cyarn-dev.hadoop.apache.org%3E
>
> [2] project in (YARN, HADOOP, MAPREDUCE, HDFS) AND fixVersion in
> (3.2.0)
> AND fixVersion not in (3.1.0, 3.0.0, 3.0.0-beta1) AND status = Resolved
> ORDER BY fixVersion ASC
>
>
>


[jira] [Created] (MAPREDUCE-7164) FileOutputCommitter does not report progress while merging paths.

2018-11-26 Thread Kuhu Shukla (JIRA)
Kuhu Shukla created MAPREDUCE-7164:
--

 Summary: FileOutputCommitter does not report progress while 
merging paths.
 Key: MAPREDUCE-7164
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7164
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.9.2, 2.8.5, 3.0.3
 Environment: In cases where the rename and merge path logic takes more 
time than usual, the committer does not report progress and can cause job 
failure. This behavior was not present in Hadoop 1.x. This JIRA will fix it so 
that the old behavior for 1.x is restored.
Reporter: Kuhu Shukla
Assignee: Kuhu Shukla






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-7141) Allow KMS generated spill encryption keys

2018-09-19 Thread Kuhu Shukla (JIRA)
Kuhu Shukla created MAPREDUCE-7141:
--

 Summary: Allow KMS generated spill encryption keys
 Key: MAPREDUCE-7141
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7141
 Project: Hadoop Map/Reduce
  Issue Type: Task
Reporter: Kuhu Shukla
Assignee: Kuhu Shukla


Currently the only way an encryption key for task spills is generated is by the 
AM's key generator. This JIRA tracks the work required to add KMS support to 
this key's generation allowing fault tolerance to AM failures/re-runs and also 
give another option to the client on how it wants the keys to be created.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-7122) MRAppMaster should not exit before shutdown when an error is encountered

2018-07-12 Thread Kuhu Shukla (JIRA)
Kuhu Shukla created MAPREDUCE-7122:
--

 Summary: MRAppMaster should not exit before shutdown when an error 
is encountered
 Key: MAPREDUCE-7122
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7122
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.8.4
Reporter: Kuhu Shukla


In a scenario where AM fails with an exception and calls the exit method, it 
fails to shutdown cleanly and can cause the JHS to not shutdown cleanly either.

{code}
appMasterUgi.doAs(new PrivilegedExceptionAction() {
  @Override
  public Object run() throws Exception {
appMaster.init(conf);
appMaster.start();
if(appMaster.errorHappenedShutDown) {
  throw new IOException("Was asked to shut down.");
}
return null;
  }
});
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 3.0.1 (RC1)

2018-03-23 Thread Kuhu Shukla
+1 (non-binding)

Built from source.
Installed on a pseudo distributed cluster.
Ran word count job and basic hdfs commands.

Thank you for the effort on this release.

Regards,
Kuhu

On Thu, Mar 22, 2018 at 5:25 PM, Elek, Marton  wrote:

>
> +1 (non binding)
>
> I did a full build from source code, created a docker container and did
> various basic level tests with robotframework based automation and
> docker-compose based pseudo clusters[1].
>
> Including:
>
> * Hdfs federation smoke test
> * Basic ViewFS configuration
> * Yarn example jobs
> * Spark example jobs (with and without yarn)
> * Simple hive table creation
>
> Marton
>
>
> [1]: https://github.com/flokkr/runtime-compose
>
> On 03/18/2018 05:11 AM, Lei Xu wrote:
>
>> Hi, all
>>
>> I've created release candidate RC-1 for Apache Hadoop 3.0.1
>>
>> Apache Hadoop 3.0.1 will be the first bug fix release for Apache
>> Hadoop 3.0 release. It includes 49 bug fixes and security fixes, which
>> include 12
>> blockers and 17 are critical.
>>
>> Please note:
>> * HDFS-12990. Change default NameNode RPC port back to 8020. It makes
>> incompatible changes to Hadoop 3.0.0.  After 3.0.1 releases, Apache
>> Hadoop 3.0.0 will be deprecated due to this change.
>>
>> The release page is:
>> https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+3.0+Release
>>
>> New RC is available at: http://home.apache.org/~lei/hadoop-3.0.1-RC1/
>>
>> The git tag is release-3.0.1-RC1, and the latest commit is
>> 496dc57cc2e4f4da117f7a8e3840aaeac0c1d2d0
>>
>> The maven artifacts are available at:
>> https://repository.apache.org/content/repositories/orgapachehadoop-1081/
>>
>> Please try the release and vote; the vote will run for the usual 5
>> days, ending on 3/22/2017 6pm PST time.
>>
>> Thanks!
>>
>> -
>> To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
>> For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
>>
>>
> -
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
>
>


Re: [VOTE] Release Apache Hadoop 2.8.3 (RC0)

2017-12-12 Thread Kuhu Shukla
+1 non-binding.
 - Built and installed from source on a pseudo distributed cluster.
 - Ran sample jobs like wordcount, sleep etc.
 - Ran Tez 0.9 sample jobs.

Regards,
Kuhu


On Mon, Dec 11, 2017 at 7:31 PM, Brahma Reddy Battula 
wrote:

> +1 (non-binding), thanks Junping for driving this.
>
>
> --Built from the source
> --Installaed 3 Node HA cluster
> --Verified Basic shell Commands
> --Browsed the HDFS/YARN web UI
> --Ran sample pi,wordcount jobs
>
> --Brahma Reddy Battula
>
>
> On Tue, Dec 5, 2017 at 3:28 PM, Junping Du  wrote:
>
> > Hi all,
> >  I've created the first release candidate (RC0) for Apache Hadoop
> > 2.8.3. This is our next maint release to follow up 2.8.2. It includes 79
> > important fixes and improvements.
> >
> >   The RC artifacts are available at: http://home.apache.org/~
> > junping_du/hadoop-2.8.3-RC0
> >
> >   The RC tag in git is: release-2.8.3-RC0
> >
> >   The maven artifacts are available via repository.apache.org at:
> > https://repository.apache.org/content/repositories/orgapachehadoop-1072
> >
> >   Please try the release and vote; the vote will run for the usual 5
> > working days, ending on 12/12/2017 PST time.
> >
> > Thanks,
> >
> > Junping
> >
>
>
>
> --
>
>
>
> --Brahma Reddy Battula
>


Re: [VOTE] Release Apache Hadoop 2.9.0 (RC3)

2017-11-15 Thread Kuhu Shukla
+1 (non-binding)
1.  Built from source.
2. Deployed to pseudo node cluster
3. Ran sample jobs.

Regards,
Kuhu


On Wed, Nov 15, 2017 at 1:07 PM, Eric Badger 
wrote:

> +1 (non-binding)
>
> - Verified all hashes and checksums
> - Built from source on macOS 10.12.6, Java 1.8.0u65
> - Deployed a pseudo cluster
> - Ran some example jobs
>
> Thanks,
>
> Eric
>
> On Wed, Nov 15, 2017 at 10:24 AM, Carlo Aldo Curino <
> carlo.cur...@gmail.com>
> wrote:
>
> > +1 (binding)
> >
> > On Nov 15, 2017 8:23 AM, "Mukul Kumar Singh" 
> > wrote:
> >
> > > +1 (non-binding)
> > >
> > > I built from source on Mac OS X 10.13.1 Java 1.8.0_111
> > >
> > > - Deployed on a single node cluster.
> > > - Deployed a ViewFS cluster with two hdfs mount points.
> > > - Performed basic sanity checks.
> > > - Performed DFS operations(put, ls, mkdir, touch)
> > >
> > > Thanks,
> > > Mukul
> > >
> > >
> > > > On 14-Nov-2017, at 5:40 AM, Arun Suresh  wrote:
> > > >
> > > > Hi Folks,
> > > >
> > > > Apache Hadoop 2.9.0 is the first release of Hadoop 2.9 line and will
> be
> > > the
> > > > starting release for Apache Hadoop 2.9.x line - it includes 30 New
> > > Features
> > > > with 500+ subtasks, 407 Improvements, 790 Bug fixes new fixed issues
> > > since
> > > > 2.8.2.
> > > >
> > > > More information about the 2.9.0 release plan can be found here:
> > > > *https://cwiki.apache.org/confluence/display/HADOOP/
> > > Roadmap#Roadmap-Version2.9
> > > >  > > Roadmap#Roadmap-Version2.9>*
> > > >
> > > > New RC is available at: *https://home.apache.org/~
> > > asuresh/hadoop-2.9.0-RC3/
> > > > *
> > > >
> > > > The RC tag in git is: release-2.9.0-RC3, and the latest commit id is:
> > > > 756ebc8394e473ac25feac05fa493f6d612e6c50.
> > > >
> > > > The maven artifacts are available via repository.apache.org at:
> > > >  > > apache.org%2Fcontent%2Frepositories%2Forgapachehadoop-1066=D&
> > > sntz=1=AFQjCNFcern4uingMV_sEreko_zeLlgdlg>*https://
> > > repository.apache.org/content/repositories/orgapachehadoop-1068/
> > > >  > > orgapachehadoop-1068/>*
> > > >
> > > > We are carrying over the votes from the previous RC given that the
> > delta
> > > is
> > > > the license fix.
> > > >
> > > > Given the above - we are also going to stick with the original
> deadline
> > > for
> > > > the vote : ending on Friday 17th November 2017 2pm PT time.
> > > >
> > > > Thanks,
> > > > -Arun/Subru
> > >
> > >
> > > -
> > > To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
> > > For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
> > >
> > >
> >
>


Re: [VOTE] Release Apache Hadoop 2.7.4 (RC0)

2017-08-03 Thread Kuhu Shukla
+1 (non-binding)

1. Verified signatures and digests.
2. Built source.
3. Installed on a pseudo-distributed cluster.
4. Ran sample MR jobs and Tez example jobs like orderedwordcount successfully.

Thank you Konstantin and others for this release.

Regards,
Kuhu



On Thursday, August 3, 2017, 7:19:07 AM CDT, Sunil G  wrote:


Thanks Konstantin

+1 (binding)

1. Build tar ball from source package
2. Ran basic MR jobs and verified UI.
3. Enabled node labels and ran sleep job. Works fine.
4. Verified CLI commands related to node labels and its working fine.
5. RM WorkPreserving restart cases are also verified, and looks fine

Thanks
Sunil



On Sun, Jul 30, 2017 at 4:59 AM Konstantin Shvachko 
wrote:

> Hi everybody,
>
> Here is the next release of Apache Hadoop 2.7 line. The previous stable
> release 2.7.3 was available since 25 August, 2016.
> Release 2.7.4 includes 264 issues fixed after release 2.7.3, which are
> critical bug fixes and major optimizations. See more details in Release
> Note:
> http://home.apache.org/~shv/hadoop-2.7.4-RC0/releasenotes.html
>
> The RC0 is available at: http://home.apache.org/~shv/hadoop-2.7.4-RC0/
>
> Please give it a try and vote on this thread. The vote will run for 5 days
> ending 08/04/2017.
>
> Please note that my up to date public key are available from:
> https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
> Please don't forget to refresh the page if you've been there recently.
> There are other place on Apache sites, which may contain my outdated key.
>
> Thanks,
> --Konstantin
>

Re: [VOTE] Release Apache Hadoop 2.8.0 (RC2)

2017-03-16 Thread Kuhu Shukla
+1 (non-binding)-Downloaded source.-Verified signatures.- Compiled the 
source.-Ran sample jobs like MR sleep on pseudo distributed cluster. (Mac OS)
Thanks Junping and others!Regards,Kuhu
On Wednesday, March 15, 2017, 7:25:46 PM CDT, Junping Du  
wrote:bq. From my read of the poms, hadoop-client depends on hadoop-hdfs-client 
to pull in HDFS-related code. It doesn't have its own dependency on 
hadoop-hdfs. So I think this affects users of the hadoop-client artifact, which 
has existed for a long time.

I could miss that. Thanks for reminding! From my quick check: 
https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client/2.7.3?, it 
sounds like 669 artifacts from other projects were depending on it.


I think we should withdraw the current RC bits. Please stop the verification & 
vote.

I will kick off another RC immediately when HDFS-11431 get fixed.


Thanks,


Junping



From: Andrew Wang 
Sent: Wednesday, March 15, 2017 2:04 PM
To: Junping Du
Cc: common-...@hadoop.apache.org; hdfs-...@hadoop.apache.org; 
yarn-...@hadoop.apache.org; mapreduce-dev@hadoop.apache.org
Subject: Re: [VOTE] Release Apache Hadoop 2.8.0 (RC2)

Hi Junping, inline,


>From my understanding, this issue is related to our previous improvements with 
>separating client and server jars in HDFS-6200. If we use the new "client" jar 
>in NN HA deployment, then we will hit the issue reported.

>From my read of the poms, hadoop-client depends on hadoop-hdfs-client to pull 
>in HDFS-related code. It doesn't have its own dependency on hadoop-hdfs. So I 
>think this affects users of the hadoop-client artifact, which has existed for 
>a long time.

Essentially all of our customer deployments run with NN HA, so this would 
affect a lot of users.

I can see two options here:

- Without any change in 2.8.0, if user hit the issue when they deploy HA 
cluster by using new client jar, adding back hdfs jar just like how things work 
previously

- Make the change now in 2.8.0, either moving ConfiguredFailoverProxyProvider 
to client jar or adding dependency between client jar and server jar. There 
must be some arguments there on which way to fix is better especially 
ConfiguredFailoverProxyProvider still has some sever side dependencies.


I would prefer the first option, given:

- The issue fixing time is unpredictable as there are still discussion on how 
to fix this issue. Our 2.8.0 release shouldn't be an endless journey which has 
been deferred several times for more serious issue.

Looks like we have a patch being actively revved and reviewed to fix this by 
making hadoop-hdfs-client depend on hadoop-hdfs. Thanks to Steven and Steve for 
working on this.

Steve proposed doing a proper split in a later JIRA.

- We have workaround for this improvement, no regression happens due to this 
issue. People can still use hdfs jar in old way. The worst case is improvement 
for HDFS doesn't work in some cases - that shouldn't block the whole release.

Based on the above, I think there is a regression for users of the 
hadoop-client artifact.

If it actually only affects users of hadoop-hdfs-client, then I agree we can 
document it as a Known Issue and fix it later.

Best,
Andrew

Re: [VOTE] Release Apache Hadoop 3.0.0-alpha2 RC0

2017-01-25 Thread Kuhu Shukla
+1 (non-binding)* Built from source* Deployed on a pseudo-distributed cluster 
(MAC)* Ran wordcount and sleep jobs.
 

On Wednesday, January 25, 2017 3:21 AM, Marton Elek  
wrote:
 

 Hi,

I also did a quick smoketest with the provided 3.0.0-alpha2 binaries:

TLDR; It works well

Environment:
 * 5 hosts, docker based hadoop cluster, every component in separated container 
(5 datanode/5 nodemanager/...)
 * Components are:
  * Hdfs/Yarn cluster (upgraded 2.7.3 to 3.0.0-alpha2 using the binary package 
for vote)
  * Zeppelin 0.6.2/0.7.0-RC2
  * Spark 2.0.2/2.1.0
  * HBase 1.2.4 + zookeeper
  * + additional docker containers for configuration management and monitoring
* No HA, no kerberos, no wire encryption

 * HDFS cluster upgraded successfully from 2.7.3 (with about 200G data)
 * Imported 100G data to HBase successfully
 * Started Spark jobs to process 1G json from HDFS (using spark-master/slave 
cluster). It worked even when I used the Zeppelin 0.6.2 + Spark 2.0.2 (with old 
hadoop client included). Obviously the old version can't use the new Yarn 
cluster as the token file format has been changed.
 * I upgraded my setup to use Zeppelin 0.7.0-RC2/Spark 2.1.0(distribution 
without hadoop)/hadoop 3.0.0-alpha2. It also worked well: processed the same 
json files from HDFS with spark jobs (from zeppelin) using yarn cluster 
(master: yarn deploy-mode: cluster)
 * Started spark jobs (with spark submit, master: yarn) to count records from 
the hbase database: OK
 * Started example Mapreduce jobs from distribution over yarn. It was OK but 
only with specific configuration (see bellow)

So my overall impression that it works very well (at least with my 'smalldata')

Some notes (none of them are blocking):

1. To run the example mapreduce jobs I defined HADOOP_MAPRED_HOME at command 
line:
./bin/yarn jar 
share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-alpha2.jar pi 
-Dyarn.app.mapreduce.am.env="HADOOP_MAPRED_HOME={{HADOOP_COMMON_HOME}}" 
-Dmapreduce.admin.user.env="HADOOP_MAPRED_HOME={{HADOOP_COMMON_HOME}}" 10 10

And in the yarn-site:

yarn.nodemanager.env-whitelist: 
JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,MAPRED_HOME_DIR

I don't know the exact reason for the change, but the 2.7.3 was more 
userfriendly as the example could be run without specific configuration.

For the same reason I didn't start hbase mapreduce job with hbase command line 
app (There could be some option for hbase to define MAPRED_HOME_DIR as well, 
but by default I got ClassNotFoundException for one of the MR class)

2. For the records: The logging and htrace classes are excluded from the shaded 
hadoop client jar so I added it manually one by one to the spark (spark 2.1.0 
distribution without hadoop):

RUN wget `cat url` -O spark.tar.gz && tar zxf spark.tar.gz && rm spark.tar.gz 
&& mv spark* spark
RUN cp /opt/hadoop/share/hadoop/client/hadoop-client-api-3.0.0-alpha2.jar 
/opt/spark/jars
RUN cp /opt/hadoop/share/hadoop/client/hadoop-client-runtime-3.0.0-alpha2.jar 
/opt/spark/jars
ADD 
https://repo1.maven.org/maven2/org/slf4j/slf4j-log4j12/1.7.10/slf4j-log4j12-1.7.10.jar
 /opt/spark/jars
ADD 
https://repo1.maven.org/maven2/org/apache/htrace/htrace-core4/4.1.0-incubating/htrace-core4-4.1.0-incubating.jar
 /opt/spark/jars
ADD 
https://repo1.maven.org/maven2/org/slf4j/slf4j-api/1.7.10/slf4j-api-1.7.10.jar 
/opt/spark/jars/
ADD https://repo1.maven.org/maven2/log4j/log4j/1.2.17/log4j-1.2.17.jar 
/opt/spark/jars

With this jars files spark 2.1.0 works well with the alpha2 version of HDFS and 
YARN.

3. The messages "Upgrade in progress. Not yet finalized." wasn't disappeared 
from the namenode webui but the cluster works well.

Most probably I missed to do something, but it's a little bit confusing.

(I checked the REST call, it is the jmx bean who reports that it was not yet 
finalized, the code of the webpage seems to be ok.)

Regards
Marton

On Jan 25, 2017, at 8:38 AM, Yongjun Zhang 
> wrote:

Thanks Andrew much for the work here!

+1 (binding).

- Downloaded both binary and src tarballs
- Verified md5 checksum and signature for both
- Built from source tarball
- Deployed 2 pseudo clusters, one with the released tarball and the other
 with what I built from source, and did the following on both:
    - Run basic HDFS operations, snapshots and distcp jobs
    - Run pi job
    - Examined HDFS webui, YARN webui.

Best,

--Yongjun


On Tue, Jan 24, 2017 at 3:56 PM, Eric Badger 
>
wrote:

+1 (non-binding)
- Verified signatures and md5- Built from source- Started single-node
cluster on my mac- Ran some sleep jobs
Eric

  On Tuesday, January 24, 2017 4:32 PM, Yufei Gu 
>
wrote:


Hi Andrew,

Thanks for working on this.

+1 (Non-Binding)

1. Downloaded the binary and verified the md5.

Re: [VOTE] Release Apache Hadoop 3.0.0-alpha1 RC0

2016-09-02 Thread Kuhu Shukla
+1( non-binding)

* Successfully downloaded and built from source.
* Deployed to single node cluster.
*  Ran Sleep and Wordcount jobs.
Thanks Andrew for the effort!
Regards,Kuhu 

On Thursday, September 1, 2016 10:32 PM, Arun Suresh  
wrote:
 

 +1 (binding).

Thanks for driving this Andrew..

* Download and built from source.
* Setup a 5 mode cluster.
* Verified that MR works with opportunistic containers
* Verified that the AMRMClient supports 'allocationRequestId'

Cheers
-Arun

On Thu, Sep 1, 2016 at 4:31 PM, Aaron Fabbri  wrote:

> +1, non-binding.
>
> I built everything on OS X and ran the s3a contract tests successfully:
>
> mvn test -Dtest=org.apache.hadoop.fs.contract.s3a.\*
>
> ...
>
> Results :
>
>
> Tests run: 78, Failures: 0, Errors: 0, Skipped: 1
>
>
> [INFO]
> 
>
> [INFO] BUILD SUCCESS
>
> [INFO]
> 
>
> On Thu, Sep 1, 2016 at 3:39 PM, Andrew Wang 
> wrote:
>
> > Good point Allen, I forgot about `hadoop version`. Since it's populated
> by
> > a version-info.properties file, people can always cat that file.
> >
> > On Thu, Sep 1, 2016 at 3:21 PM, Allen Wittenauer <
> a...@effectivemachines.com
> > >
> > wrote:
> >
> > >
> > > > On Sep 1, 2016, at 3:18 PM, Allen Wittenauer <
> a...@effectivemachines.com
> > >
> > > wrote:
> > > >
> > > >
> > > >> On Sep 1, 2016, at 2:57 PM, Andrew Wang 
> > > wrote:
> > > >>
> > > >> Steve requested a git hash for this release. This led us into a
> brief
> > > >> discussion of our use of git tags, wherein we realized that although
> > > >> release tags are immutable (start with "rel/"), RC tags are not.
> This
> > is
> > > >> based on the HowToRelease instructions.
> > > >
> > > >      We should probably embed the git hash in one of the files that
> > > gets gpg signed.  That's an easy change to create-release.
> > >
> > >
> > >        (Well, one more easily accessible than 'hadoop version')
> >
>


   

Re: [VOTE] Release Apache Hadoop 2.7.3 RC2

2016-08-19 Thread Kuhu Shukla
+1 (non-binding).- Downloaded tarball (source and binary)
- Verified signatures.
- Compiled, built source code and deployed on a single node cluster
- Ran sample MR jobs (Sleep, Wordcount) and  some "hadoop fs" commands.
Thanks a lot Vinod for your work on this release!
Regards,Kuhu Shukla 

On Friday, August 19, 2016 5:32 PM, Eric Payne 
<eric.payne1...@yahoo.com.INVALID> wrote:
 

 Thanks, Vinod, for working so hard on each 2.7 release.

+1 (non-binding)

Here's what I did:

- Built native 
- Installed on 3-node unsecure cluster
- Configured 2 queues with 2 separate label partitions 
- Verified that a job will successfully run on the correctly labelled node by 
specifying a non-default (but queue-accessible) label. 
- Verified that a distributed shell job would keep non-AM containers running 
across an App Master attempt restart. 
- Verified that preemption happens as expected (sort of). I say "sort of" 
because about twice as many containers were preempted as I thought should have 
been, but once the other underserved app began to run, it stopped preempting. 
Also, it didn't preempt between 2 queues with the same partition label. 
Partition preemption may not be supported in 2.7, so this is probably also okay.


Thanks!
Eric Payne





From: Vinod Kumar Vavilapalli <vino...@apache.org>
To: "common-...@hadoop.apache.org" <common-...@hadoop.apache.org>; 
hdfs-...@hadoop.apache.org; yarn-...@hadoop.apache.org; 
"mapreduce-dev@hadoop.apache.org" <mapreduce-dev@hadoop.apache.org> 
Cc: Vinod Kumar Vavilapalli <vino...@apache.org>
Sent: Wednesday, August 17, 2016 9:05 PM
Subject: [VOTE] Release Apache Hadoop 2.7.3 RC2


Hi all,

I've created a new release candidate RC2 for Apache Hadoop 2.7.3.

As discussed before, this is the next maintenance release to follow up 2.7.2.

The RC is available for validation at: 
http://home.apache.org/~vinodkv/hadoop-2.7.3-RC2/ 
<http://home.apache.org/~vinodkv/hadoop-2.7.3-RC2/>

The RC tag in git is: release-2.7.3-RC2

The maven artifacts are available via repository.apache.org 
<http://repository.apache.org/> at 
https://repository.apache.org/content/repositories/orgapachehadoop-1046 
<https://repository.apache.org/content/repositories/orgapachehadoop-1046>

The release-notes are inside the tar-balls at location 
hadoop-common-project/hadoop-common/src/main/docs/releasenotes.html. I hosted 
this at http://home.apache.org/~vinodkv/hadoop-2.7.3-RC2/releasenotes.html 
<http://home.apache.org/~vinodkv/hadoop-2.7.3-RC2/releasenotes.html> for your 
quick perusal.

As you may have noted,
- few issues with RC0 forced a RC1 [1]
- few more issues with RC1 forced a RC2 [2]
- a very long fix-cycle for the License & Notice issues (HADOOP-12893) caused 
2.7.3 (along with every other Hadoop release) to slip by quite a bit. This 
release's related discussion thread is linked below: [3].

Please try the release and vote; the vote will run for the usual 5 days.

Thanks,
Vinod

[1] [VOTE] Release Apache Hadoop 2.7.3 RC0: 
https://www.mail-archive.com/hdfs-dev%40hadoop.apache.org/index.html#26106 
<https://www.mail-archive.com/hdfs-dev@hadoop.apache.org/index.html#26106>
[2] [VOTE] Release Apache Hadoop 2.7.3 RC1: 
https://www.mail-archive.com/hdfs-dev%40hadoop.apache.org/msg26336.html 
<https://www.mail-archive.com/hdfs-dev@hadoop.apache.org/msg26336.html>
[3] 2.7.3 release plan: 
https://www.mail-archive.com/hdfs-dev%40hadoop.apache.org/msg24439.html 
<http://markmail.org/thread/6yv2fyrs4jlepmmr>

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



   

Re: [VOTE] Release Apache Hadoop 2.7.3 RC1

2016-08-18 Thread Kuhu Shukla
Hi All,
Thank you for all the inputs on HDFS-9395. I have opened HDFS-10776 to discuss 
the modifications needed for audit logging to be consistent and comprehensive. 
We can move this discussion to the new JIRA.
Appreciate the support.
Regards,Kuhu Shukla 

On Thursday, August 18, 2016 12:04 PM, Chris Nauroth 
<cnaur...@hortonworks.com> wrote:
 

 Andrew, thanks for adding your perspective on this.

What is a realistic strategy for us to evolve the HDFS audit log in a 
backward-compatible way?  If the API is essentially any form of ad-hoc 
scripting, then for any proposed audit log format change, I can find a reason 
to veto it on grounds of backward incompatibility.

- I can’t add a new field on the end, because that would break an awk script 
that uses $NF expecting to find a specific field.
- I can’t prepend a new field, because that would break a "cut -f1" expecting 
to find the timestamp.
- HDFS can’t add any new features, because someone might have written a 
script that does "exit 1" if it finds an unexpected RPC in the "cmd=" field.
- Hadoop is not allowed to add full IPv6 support, because someone might have 
written a script that looks at the "ip=" field and parses it by IPv4 syntax.

On the CLI, a potential solution for evolving the output is to preserve the old 
format by default and only enable the new format if the user explicitly passes 
a new argument.  What should we do for the audit log?  Configuration flags in 
hdfs-site.xml?  (That of course adds its own brand of complexity.)

I’m particularly interested to hear potential solutions from people like 
Andrew and Allen who have been most vocal about the need for a stable format.  
Without a solution, this unfortunately devolves into the format being frozen 
within a major release line.

We could benefit from getting a patch on the compatibility doc that addresses 
the HDFS audit log specifically. 

--Chris Nauroth

On 8/18/16, 8:47 AM, "Andrew Purtell" <andrew.purt...@gmail.com> wrote:

    An incompatible APIs change is developer unfriendly. An incompatible 
behavioral change is operator unfriendly. Historically, one dimension of 
incompatibility has had a lot more mindshare than the other. It's great that 
this might be changing for the better. 
    
    Where I work when we move from one Hadoop 2.x minor to another we always 
spend time updating our deployment plans, alerting, log scraping, and related 
things due to changes. Some are debatable as if qualifying for the 
'incompatible' designation. I think the audit logging change that triggered 
this discussion is a good example of one that does. If you want to audit HDFS 
actions those log emissions are your API. (Inotify doesn't offer access control 
events.) One has to code regular expressions for parsing them and reverse 
engineer under what circumstances an audit line is emitted so you can make 
assumptions about what transpired. Change either and you might break someone's 
automation for meeting industry or legal compliance obligations. Not a trivial 
matter. If you don't operate Hadoop in production you might not realize the 
implications of such a change. Glad to see Hadoop has community diversity to 
recognize it in some cases. 
    
    > On Aug 18, 2016, at 6:57 AM, Junping Du <j...@hortonworks.com> wrote:
    > 
    > I think Allen's previous comments are very misleading. 
    > In my understanding, only incompatible API (RPC, CLIs, WebService, etc.) 
shouldn't land on branch-2, but other incompatible behaviors (logs, audit-log, 
daemon's restart, etc.) should get flexible for landing. Otherwise, how could 
52 issues ( https://s.apache.org/xJk5) marked with incompatible-changes could 
get landed on branch-2 after 2.2.0 release? Most of them are already released. 
    > 
    > Thanks,
    > 
    > Junping
    > 
    > From: Vinod Kumar Vavilapalli <vino...@apache.org>
    > Sent: Wednesday, August 17, 2016 9:29 PM
    > To: Allen Wittenauer
    > Cc: common-...@hadoop.apache.org; hdfs-...@hadoop.apache.org; 
yarn-...@hadoop.apache.org; mapreduce-dev@hadoop.apache.org
    > Subject: Re: [VOTE] Release Apache Hadoop 2.7.3 RC1
    > 
    > I always look at CHANGES.txt entries for incompatible-changes and this 
JIRA obviously wasn’t there.
    > 
    > Anyways, this shouldn’t be in any of branch-2.* as committers there 
clearly mentioned that this is an incompatible change.
    > 
    > I am reverting the patch from branch-2* .
    > 
    > Thanks
    > +Vinod
    > 
    >> On Aug 16, 2016, at 9:29 PM, Allen Wittenauer 
<a...@effectivemachines.com> wrote:
    >> 
    >> 
    >> 
    >> -1
    >> 
    >> HDFS-9395 is an incompatible change:
    >> 
    >> a) Why is not marked as such in the changes file?
    >&

Re: [VOTE] Release Apache Hadoop 2.6.4 RC0

2016-02-09 Thread Kuhu Shukla
+1 (non-binding)
- Built source- Deployed a single node cluster- Ran a sample streaming job- Ran 
basic hadoop fs commands.
Thanks Junping!Regards,Kuhu 

On Tuesday, February 9, 2016 8:46 AM, Wangda Tan  
wrote:
 

 +1 (binding)

- Deployed a local cluster.
- Configured node labels on queues/nodes.
* Run job with node labels successfully.

Regards,
Wangda

On Tue, Feb 9, 2016 at 10:36 PM, Wangda Tan  wrote:

> Hi Eric,
>
> replaceLabelsOnNode CLI syntax is different between 2.6 and 2.7.
>
> As mentioned by Naga, In 2.6, you should use "node1<:port>,label
> node2,label ..."
> And in 2.7, you can use either "," or "=" to separate host and label.
>
> So 2.7 is backward-compatible to 2.6, but it's not guaranteed that 2.7's
> CLI works on 2.6.
>
> Please let me know if you have any concern for this.
>
> Thanks,
> Wangda
>
> On Tue, Feb 9, 2016 at 4:19 AM, Eric Payne <
> eric.payne1...@yahoo.com.invalid> wrote:
>
>> Naganarasimha Garla, thanks for the reply.
>>
>> Yes, I used the node ID. I did not include a port. Here are the steps I
>> used, which work for me on 2.7:
>>
>> - yarn rmadmin -addToClusterNodeLabels abc
>> - yarn rmadmin -replaceLabelsOnNode hostname.company.com=abc
>> - configure queue properties as appropriate
>> - yarn rmadmin -refreshQueues
>>
>> As I say, this works for me when I try it on 2.7 and later. It's probably
>> something with my environment. I will continue to look into it.
>>
>> Thanks for your help
>> -Eric
>>
>>
>> 
>> From: Naganarasimha Garla 
>> To: mapreduce-dev@hadoop.apache.org; Eric Payne > >
>> Cc: "common-...@hadoop.apache.org" ; "
>> hdfs-...@hadoop.apache.org" ; "
>> yarn-...@hadoop.apache.org" 
>> Sent: Monday, February 8, 2016 1:01 PM
>> Subject: Re: [VOTE] Release Apache Hadoop 2.6.4 RC0
>>
>>
>>
>> +1 (non binding)
>>
>> * Downloaded hadoop-2.6.4-RC0-src.tar.gz- built from source both package,
>> install, and verified the MD5 checksum
>>
>> * Did a Pseudo cluster and tested basic hdfs operations
>> * Ran sleep job and Pi job
>> * Added node label and ran job under the label by configuring
>> default-node-label-expression and it ran fine
>>
>> Eric Payne,
>> Hope you tried adding/replacing the labels using NodeId/Node Address and
>> not the HTTP address!
>> I executed the following command to configure the label and node
>>  "./yarn rmadmin -replaceLabelsOnNode  "localhost:43795,test1"  "
>> After this was able to submit the job for a label
>>
>> Regards,
>> + Naga
>>
>>
>> On Mon, Feb 8, 2016 at 11:06 PM, Eric Payne
>>  wrote:
>>
>> Hi Junping Du. Thank you for your work preparing this release.
>> >I did the following things to test release Hadoop 2.6.4 rc0:- Downloaded
>> hadoop-2.6.4-RC0-src.tar.gz- built from source both package, install, and
>> eclipse:eclipse- Set up a 3-node, unsecured cluster with 3 queues, one of
>> which has preemption enabled- Ran a successful test to ensure that
>> preemption would happen to containers on the preemptable queue if they were
>> needed for an application on another queue.- Ran successful streaming and
>> yarn shell tests
>> >Junping, I did have a concern about labelled nodes and queues. Is full
>> label support backported to 2.6.4? I see that the syntax for the rmadmin
>> command lists label commands like -addToClusterNodeLabels and
>> -replaceLabelsOnNode. I was able to add a label (using
>> -addToClusterNodeLabels) and I was able to define a queue whose accessible
>> node label was listed with my specified label. However, when I tried to set
>> the node label to a specific node using -replaceLabelsOnNode, the label
>> does not show up on the specified node in cluster nodes UI (
>> http://RM:8088/cluster/nodes). I also confirmed that submitting a job to
>> the labelled queue gets accepted but never runs, which is the behavior I
>> would expect if no node had the specified label. I will also add that this
>> procedure works fine in 2.7.
>> >Thanks,-Eric Payne
>> >
>> >      From: Junping Du 
>> > To: "hdfs-...@hadoop.apache.org" ; "
>> yarn-...@hadoop.apache.org" ; "
>> mapreduce-dev@hadoop.apache.org" ; "
>> common-...@hadoop.apache.org" 
>> > Sent: Wednesday, February 3, 2016 1:01 AM
>> > Subject: [VOTE] Release Apache Hadoop 2.6.4 RC0
>> >
>> >
>> >Hi community folks,
>> >  I've created a release candidate RC0 for Apache Hadoop 2.6.4 (the next
>> maintenance release to follow up 2.6.3.) according to email thread of
>> release plan 2.6.4 [1]. Below is details of this release candidate:
>> >
>> >The RC is available for validation at:
>> >*http://people.apache.org/~junping_du/hadoop-2.6.4-RC0/
>> 

Re: [VOTE] Release Apache Hadoop 2.7.2 RC2

2016-01-18 Thread Kuhu Shukla
+1 (non-binding)
- Downloaded source tarball
- Successfully built from source tarball
- Deployed on a pseudo cluster
- Ran simple HDFS commands and a Hadoop streaming job successfully.

Thanks a lot Vinod!
Regards,Kuhu 

On Saturday, January 16, 2016 6:58 PM, Junping Du  
wrote:
 

 In addition, all new fixes backport from 2.6.3 doesn't listed in 2.7.2 entry 
of CHANGES.txt but keep listed as 2.6.3. I think this is fine no matter we have 
multiple entries to track the same commit or a single entry to track the 
earliest commit. The only thing matter is the commits in release notes (coming 
from JIRA's Fix Versions) are matching with commits that actually land on the 
branch and each commit is tracked in some entry on CHANGES.txt. I did manually 
check that all 155 fixes are existing in commit log and we have entry in 
CHANGES.txt to track each commit on 2.7.2. 
Let's keep focusing on deployment test of 2.7.2. Thanks!

Thanks,

Junping

From: Akira AJISAKA 
Sent: Saturday, January 16, 2016 7:39 PM
To: common-...@hadoop.apache.org
Cc: hdfs-...@hadoop.apache.org; yarn-...@hadoop.apache.org; 
mapreduce-dev@hadoop.apache.org
Subject: Re: [VOTE] Release Apache Hadoop 2.7.2 RC2

Hi Xiao,

 >  From a quick comparison between the releasenotes.html and the
CHANGES.txt
 > files in source tarball, the number of total JIRAs is quite different.

In CHANGES.txt, JIRAs fixed in 2.6.1/2.6.2 are not in 2.7.2.
That is why the number of JIRAs are different.

Regards,
Akira

On 1/16/16 09:11, Xiao Chen wrote:
> Thanks Vinod for preparing the release package.
>
>
> +1 (non-binding).
>
> I verified the following:
>
>
>    - Successfully downloaded source tarball, verified md5
>    - Ran `mvn apache-rat:check` on source tarball, passed
>    - Successfully built from source tarball
>    - Successfully started a pseudo distributed cluster
>    - Ran some basic hdfs operations, then successfully ran a distcp
>
>
> 1 question:
>  From a quick comparison between the releasenotes.html and the CHANGES.txt
> files in source tarball, the number of total JIRAs is quite different.
> Seems like CHANGES.txt misses some of the fixes, because Searching
> 
> 'project
> in (HDFS, HADOOP, MAPREDUCE, YARN) AND resolution = Fixed AND fixVersion =
> 2.7.2' from JIRA agrees with releasenotes.html (count=155).
>
> Here is how I did the check:
>
>
>    -  cat ~/Downloads/releasenotes.html |grep "
>    https://issues.apache.org/jira/; | awk -F ">" '{print $3}'|awk -F "<"
>    '{print $1}' |wc -l
>
>              returns 155
>
>
>    - for f in `find . -name CHANGES.txt |grep -v target`; do cat $f |grep
>    -B 1 "Release 2.7.1" |grep .*-[0-9] |grep -v "Release 2.7" ; done|wc -l
>
>              returns 103
>
> I'm not sure whether this is a problem or not.
> Best,
> -Xiao
>
> On Thu, Jan 14, 2016 at 8:57 PM, Vinod Kumar Vavilapalli > wrote:
>
>> Hi all,
>>
>> I've created an updated release candidate RC2 for Apache Hadoop 2.7.2.
>>
>> As discussed before, this is the next maintenance release to follow up
>> 2.7.1.
>>
>> The RC is available for validation at:
>> http://people.apache.org/~vinodkv/hadoop-2.7.2-RC2/
>>
>> The RC tag in git is: release-2.7.2-RC2
>>
>> The maven artifacts are available via repository.apache.org <
>> http://repository.apache.org/> at
>> https://repository.apache.org/content/repositories/orgapachehadoop-1027 <
>> https://repository.apache.org/content/repositories/orgapachehadoop-1027>
>>
>> The release-notes are inside the tar-balls at location
>> hadoop-common-project/hadoop-common/src/main/docs/releasenotes.html. I
>> hosted this at
>> http://people.apache.org/~vinodkv/hadoop-2.7.2-RC2/releasenotes.html <
>> http://people.apache.org/~vinodkv/hadoop-2.7.2-RC1/releasenotes.html> for
>> your quick perusal.
>>
>> As you may have noted,
>>  - I terminated the RC1 related voting thread after finding out that we
>> didn’t have a bunch of patches that are already in the released 2.6.3
>> version. After a brief discussion, we decided to keep the parallel 2.6.x
>> and 2.7.x releases incremental, see [4] for this discussion.
>>  - The RC0 related voting thread got halted due to some critical issues.
>> It took a while again for getting all those blockers out of the way. See
>> the previous voting thread [3] for details.
>>  - Before RC0, an unusually long 2.6.3 release caused 2.7.2 to slip by
>> quite a bit. This release's related discussion threads are linked below:
>> [1] and [2].
>>
>> Please try the release and vote; the vote will run for the usual 5 days.
>>
>> Thanks,
>> Vinod
>>
>> [1]: 2.7.2 release plan: http://markmail.org/message/oozq3gvd4nhzsaes <
>> http://markmail.org/message/oozq3gvd4nhzsaes>
>> [2]: Planning Apache Hadoop 2.7.2

[jira] [Created] (MAPREDUCE-6603) Add counters for failed task attempts

2016-01-11 Thread Kuhu Shukla (JIRA)
Kuhu Shukla created MAPREDUCE-6603:
--

 Summary: Add counters for failed task attempts
 Key: MAPREDUCE-6603
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6603
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.6.3, 2.7.1
Reporter: Kuhu Shukla
Assignee: Kuhu Shukla
Priority: Minor


The counters for failed task attempts are currently unavailable and would be 
nice to have for troubleshooting whilst not including them in the aggregate 
counters at task or job level. One should be able to view them at attempt level.
{code}
Sorry it looks like task_1_2_r_3 has no counters. 
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MAPREDUCE-6500) DynamicInputChunk and DynamicRecordReader class has no unit tests

2015-10-02 Thread Kuhu Shukla (JIRA)
Kuhu Shukla created MAPREDUCE-6500:
--

 Summary: DynamicInputChunk and DynamicRecordReader class has no 
unit tests
 Key: MAPREDUCE-6500
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6500
 Project: Hadoop Map/Reduce
  Issue Type: Test
  Components: distcp
Reporter: Kuhu Shukla
Assignee: Kuhu Shukla
Priority: Minor


The Dynamic strategy of DistCp has test coverage only for its InputFormat 
class. It would be nice to have coverage for DynamicRecordReader and 
DynamicInputChunk classes as well



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [VOTE] Release Apache Hadoop 2.6.1 RC1

2015-09-21 Thread Kuhu Shukla
Thanks everyone! I checked the following things and did not find any issues.* 
Compilation* Running unit tests under org.apache.hadoop.yarn.server package* 
Bringing up a single node cluster* Running  a WordCount job
Regards,Kuhu Shukla

 On Monday, September 21, 2015 2:55 PM, Chang Li <lichang...@gmail.com> 
wrote:
   

 Thanks everyone who helped on this release!
Have run compilation and run various job on single node cluster. Have also
test my contribution for YARN-3267
<https://issues.apache.org/jira/browse/YARN-3267> and verified that all
related unit tests pass.

Thank you,
Chang Li


  

[jira] [Created] (MAPREDUCE-6473) Job submission can take a long time during Cluster initialization

2015-09-09 Thread Kuhu Shukla (JIRA)
Kuhu Shukla created MAPREDUCE-6473:
--

 Summary: Job submission can take a long time during Cluster 
initialization
 Key: MAPREDUCE-6473
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6473
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Kuhu Shukla
Assignee: Kuhu Shukla


During initialization in Cluster.java, the framework provider classes are 
loaded inside a sync block which can considerably increase job submission time 
when the number of submissions are high. The motive is to reduce time spent in 
this sync block safely to improve performance.
{noformat}
synchronized (frameworkLoader) {
  for (ClientProtocolProvider provider : frameworkLoader) {
LOG.debug("Trying ClientProtocolProvider : "
+ provider.getClass().getName());
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)