[jira] [Created] (MAPREDUCE-6423) MapOutput Sampler

2015-07-01 Thread Ram Manohar Bheemana (JIRA)
Ram Manohar Bheemana created MAPREDUCE-6423:
---

 Summary: MapOutput Sampler
 Key: MAPREDUCE-6423
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6423
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Ram Manohar Bheemana
Priority: Minor


Need a sampler based on the MapOutput Keys. Current InputSampler implementation 
has a major drawback which is input and output of a mapper should be same, 
generally this isn't the case.

approach:
1. Create a Sampler which samples the data based on the input.
2. Run a small map reduce in uber task mode using the original job mapper and 
identity reducer to generate required MapOutputSample keys
3. Optionally, we can input the input file to be sample. For example inputs 
files A, B; we should be able to specify to use only file A for sampling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MAPREDUCE-6422) Add REST API for getting all attempts for all the tasks

2015-07-01 Thread Lavkesh Lahngir (JIRA)
Lavkesh Lahngir created MAPREDUCE-6422:
--

 Summary: Add REST API for getting all attempts for all the tasks
 Key: MAPREDUCE-6422
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6422
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Lavkesh Lahngir
Assignee: Lavkesh Lahngir


Web UI has the feature where one can get all attempts for all map tasks or 
reduce tasks. 
REST api seems to be missing it. 
Should we add this in both HsWebService and AMWebService ?
{code}
  @GET
  @Path(/mapreduce/jobs/{jobid}/tasks/attempts)
  @Produces({ MediaType.APPLICATION_JSON, MediaType.APPLICATION_XML })
  public JobTaskAttemptsInfo getAllJobTaskAttempts(@Context HttpServletRequest 
hsr,
   @PathParam(jobid) String jid, @QueryParam(type) String type) {
}
{code}

We might also add queryparam on state to filter by succeeded attempts etc.
Thoughts ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Hadoop-Mapreduce-trunk - Build # 2191 - Failure

2015-07-01 Thread Apache Jenkins Server
See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2191/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 32616 lines...]
[INFO] Reactor Summary:
[INFO] 
[INFO] Apache Hadoop MapReduce Client  SUCCESS [  2.748 s]
[INFO] Apache Hadoop MapReduce Core .. SUCCESS [01:31 min]
[INFO] Apache Hadoop MapReduce Common  SUCCESS [ 28.753 s]
[INFO] Apache Hadoop MapReduce Shuffle ... SUCCESS [  4.807 s]
[INFO] Apache Hadoop MapReduce App ... SUCCESS [08:38 min]
[INFO] Apache Hadoop MapReduce HistoryServer . SUCCESS [05:35 min]
[INFO] Apache Hadoop MapReduce JobClient . FAILURE [  01:42 h]
[INFO] Apache Hadoop MapReduce HistoryServer Plugins . SKIPPED
[INFO] Apache Hadoop MapReduce NativeTask  SKIPPED
[INFO] Apache Hadoop MapReduce Examples .. SKIPPED
[INFO] Apache Hadoop MapReduce ... SKIPPED
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 01:59 h
[INFO] Finished at: 2015-07-01T16:03:50+00:00
[INFO] Final Memory: 34M/751M
[INFO] 
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:2.17:test (default-test) on 
project hadoop-mapreduce-client-jobclient: There are test failures.
[ERROR] 
[ERROR] Please refer to 
/home/jenkins/jenkins-slave/workspace/Hadoop-Mapreduce-trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/surefire-reports
 for the individual test results.
[ERROR] - [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn goals -rf :hadoop-mapreduce-client-jobclient
Build step 'Execute shell' marked build as failure
[FINDBUGS] Skipping publisher since build result is FAILURE
Archiving artifacts
Sending artifact delta relative to Hadoop-Mapreduce-trunk #2190
Archived 1 artifacts
Archive block size is 32768
Received 1 blocks and 20415997 bytes
Compression is 0.2%
Took 7.2 sec
Recording test results
Updating HADOOP-12124
Updating HADOOP-12116
Updating HADOOP-10798
Updating YARN-3827
Updating MAPREDUCE-6121
Updating HDFS-8635
Updating MAPREDUCE-6384
Updating YARN-3768
Updating HADOOP-12164
Updating YARN-3823
Updating HADOOP-12149
Updating MAPREDUCE-6407
Updating HADOOP-12159
Updating HADOOP-12158
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
1 tests failed.
REGRESSION:  org.apache.hadoop.mapred.TestLazyOutput.testLazyOutput

Error Message:
java.io.IOException: ResourceManager failed to start. Final state is STOPPED

Stack Trace:
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: 
ResourceManager failed to start. Final state is STOPPED
at 
org.apache.hadoop.yarn.server.MiniYARNCluster.startResourceManager(MiniYARNCluster.java:329)
at 
org.apache.hadoop.yarn.server.MiniYARNCluster.access$500(MiniYARNCluster.java:98)
at 
org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:455)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.mapred.MiniMRClientClusterFactory.create(MiniMRClientClusterFactory.java:80)
at org.apache.hadoop.mapred.MiniMRCluster.init(MiniMRCluster.java:187)
at org.apache.hadoop.mapred.MiniMRCluster.init(MiniMRCluster.java:175)
at org.apache.hadoop.mapred.MiniMRCluster.init(MiniMRCluster.java:167)
at org.apache.hadoop.mapred.MiniMRCluster.init(MiniMRCluster.java:159)
at org.apache.hadoop.mapred.MiniMRCluster.init(MiniMRCluster.java:152)
at org.apache.hadoop.mapred.MiniMRCluster.init(MiniMRCluster.java:145)
at org.apache.hadoop.mapred.MiniMRCluster.init(MiniMRCluster.java:138)
at org.apache.hadoop.mapred.MiniMRCluster.init(MiniMRCluster.java:133)
at 

Re: Planning Hadoop 2.6.1 release

2015-07-01 Thread Sean Busbey
Any update on a release plan for 2.6.1?

On Wed, Jun 10, 2015 at 1:25 AM, Brahma Reddy Battula 
brahmareddy.batt...@huawei.com wrote:

 HI vinod

 any update on this..? are we planning to give 2.6.1 Or can we make 2.7.1
 as stable give..?


 Thanks  Regards
  Brahma Reddy Battula

 
 From: Zhihai Xu [z...@cloudera.com]
 Sent: Wednesday, May 13, 2015 12:04 PM
 To: mapreduce-dev@hadoop.apache.org
 Cc: common-...@hadoop.apache.org; yarn-...@hadoop.apache.org;
 hdfs-...@hadoop.apache.org
 Subject: Re: Planning Hadoop 2.6.1 release

 Hi Akira,

 Can we also include YARN-3242? YARN-3242 fixed a critical ZKRMStateStore
 bug.
 It will work better with YARN-2992.

 thanks
 zhihai


 On Tue, May 12, 2015 at 10:38 PM, Akira AJISAKA 
 ajisa...@oss.nttdata.co.jp
 wrote:

  Thanks all for collecting jiras for 2.6.1 release. In addition, I'd like
  to include the following:
 
  * HADOOP-11343. Overflow is not properly handled in calculating final iv
  for AES CTR
  * YARN-2874. Dead lock in DelegationTokenRenewer which blocks RM to
  execute any further apps
  * YARN-2992. ZKRMStateStore crashes due to session expiry
  * YARN-3013. AMRMClientImpl does not update AMRM token properly
  * YARN-3369. Missing NullPointer check in AppSchedulingInfo causes RM to
  die
  * MAPREDUCE-6303. Read timeout when retrying a fetch error can be fatal
 to
  a reducer
 
  All of these are marked as blocker bug for 2.7.0 but not fixed in 2.6.0.
 
  Regards,
  Akira
 
 
  On 5/4/15 11:15, Brahma Reddy Battula wrote:
 
  Hello Vinod,
 
  I am thinking,can we include HADOOP-11491 also..? wihout this jira harfs
  will not be usable when cluster installed in HA mode and try to get
  filecontext like below..
 
 
  Path path = new
 
 Path(har:///archivedLogs/application_1428917727658_0005-application_1428917727658_0008-1428927448352.har);
  FileSystem fs = path.getFileSystem(new Configuration());
  path = fs.makeQualified(path);
  FileContext fc = FileContext.getFileContext(path.toUri(),new
  Configuration());
 
 
 
  Thanks  Regards
  Brahma Reddy Battula
  
  From: Chris Nauroth [cnaur...@hortonworks.com]
  Sent: Friday, May 01, 2015 4:32 AM
  To: mapreduce-dev@hadoop.apache.org; common-...@hadoop.apache.org;
  yarn-...@hadoop.apache.org; hdfs-...@hadoop.apache.org
  Subject: Re: Planning Hadoop 2.6.1 release
 
  Thank you, Arpit.  In addition, I suggest we include the following:
 
  HADOOP-11333. Fix deadlock in DomainSocketWatcher when the notification
  pipe is full
  HADOOP-11604. Prevent ConcurrentModificationException while closing
 domain
  sockets during shutdown of DomainSocketWatcher thread.
  HADOOP-11648. Set DomainSocketWatcher thread name explicitly
  HADOOP-11802. DomainSocketWatcher thread terminates sometimes after
 there
  is an I/O error during requestShortCircuitShm
 
  HADOOP-11604 and 11648 are not critical by themselves, but they are
  pre-requisites to getting a clean cherry-pick of 11802, which we believe
  finally fixes the root cause of this issue.
 
 
  --Chris Nauroth
 
 
 
 
  On 4/30/15, 3:55 PM, Arpit Agarwal aagar...@hortonworks.com wrote:
 
   HDFS candidates for back-porting to Hadoop 2.6.1. The first two were
  requested in [1].
 
  HADOOP-11674. oneByteBuf in CryptoInputStream and CryptoOutputStream
  should be non static
  HADOOP-11710. Make CryptoOutputStream behave like DFSOutputStream wrt
  synchronization
 
  HDFS-7009. Active NN and standby NN have different live nodes.
  HDFS-7035. Make adding a new data directory to the DataNode an atomic
 and
  improve error handling
  HDFS-7425. NameNode block deletion logging uses incorrect appender.
  HDFS-7443. Datanode upgrade to BLOCKID_BASED_LAYOUT fails if duplicate
  block files are present in the same volume.
  HDFS-7489. Incorrect locking in FsVolumeList#checkDirs can hang
 datanodes
  HDFS-7503. Namenode restart after large deletions can cause slow
  processReport.
  HDFS-7575. Upgrade should generate a unique storage ID for each volume.
  HDFS-7579. Improve log reporting during block report rpc failure.
  HDFS-7587. Edit log corruption can happen if append fails with a quota
  violation.
  HDFS-7596. NameNode should prune dead storages from storageMap.
  HDFS-7611. deleteSnapshot and delete of a file can leave orphaned
 blocks
  in the blocksMap on NameNode restart.
  HDFS-7714. Simultaneous restart of HA NameNodes and DataNode can cause
  DataNode to register successfully with only one NameNode.
  HDFS-7733. NFS: readdir/readdirplus return null directory attribute on
  failure.
  HDFS-7831. Fix the starting index and end condition of the loop in
  FileDiffList.findEarlierSnapshotBlocks().
  HDFS-7885. Datanode should not trust the generation stamp provided by
  client.
  HDFS-7960. The full block report should prune zombie storages even if
  they're not empty.
  HDFS-8072. Reserved RBW space is not released if client terminates
 while
  writing block.
  HDFS-8127. NameNode 

RE: [VOTE] Release Apache Hadoop 2.7.1 RC0

2015-07-01 Thread Rohith Sharma K S
+1 (non-binding)

Build from source
deployed in 4 nodes cluster for Secure Mode and Non-Secure Mode. 
Tested with applications spark and MapReduce for RM HA, RM 
workPreservingRestat, NM work preserving restart.
  
- Rohith Sharma K S

-Original Message-
From: Mit Desai [mailto:mitdesa...@gmail.com] 
Sent: 30 June 2015 23:33
To: hdfs-...@hadoop.apache.org
Cc: common-...@hadoop.apache.org; yarn-...@hadoop.apache.org; 
mapreduce-dev@hadoop.apache.org
Subject: Re: [VOTE] Release Apache Hadoop 2.7.1 RC0

+1 (non-binding)

+ Built from source
+ Verified signatures
+ Deployed on a single node cluster
+ Ran some sample jobs to successful completion

Thanks for driving the release Vinod!

-Mit Desai


On Tue, Jun 30, 2015 at 12:51 PM, Varun Vasudev vvasu...@apache.org wrote:

 +1 (non-binding)

 Built from source, deployed in a single node cluster and ran some test 
 jobs.

 -Varun



 On 6/30/15, 9:58 AM, Zhijie Shen zs...@hortonworks.com wrote:

 +1 (binding)
 
 Built from source, deployed a single node cluster and tried some MR jobs.
 
 - Zhijie
 
 From: Devaraj K deva...@apache.org
 Sent: Monday, June 29, 2015 9:24 PM
 To: common-...@hadoop.apache.org
 Cc: hdfs-...@hadoop.apache.org; yarn-...@hadoop.apache.org;
 mapreduce-dev@hadoop.apache.org
 Subject: Re: [VOTE] Release Apache Hadoop 2.7.1 RC0
 
 +1 (non-binding)
 
 Deployed in a 3 node cluster and ran some Yarn Apps and MR examples, 
 works fine.
 
 
 On Tue, Jun 30, 2015 at 1:46 AM, Xuan Gong xg...@hortonworks.com wrote:
 
  +1 (non-binding)
 
  Compiled and deployed a single node cluster, ran all the tests.
 
 
  Xuan Gong
 
  On 6/29/15, 1:03 PM, Arpit Gupta ar...@hortonworks.com wrote:
 
  +1 (non binding)
  
  We have been testing rolling upgrades and downgrades from 2.6 to 
  this release and have had successful runs.
  
  --
  Arpit Gupta
  Hortonworks Inc.
  http://hortonworks.com/
  
   On Jun 29, 2015, at 12:45 PM, Lei Xu l...@cloudera.com wrote:
  
   +1 binding
  
   Downloaded src and bin distribution, verified md5, sha1 and 
   sha256 checksums of both tar files.
   Built src using mvn package.
   Ran a pseudo HDFS cluster
   Ran dfs -put some files, and checked files on NN's web interface.
  
  
  
   On Mon, Jun 29, 2015 at 11:54 AM, Wangda Tan 
  wheele...@gmail.com
  wrote:
   +1 (non-binding)
  
   Compiled and deployed a single node cluster, tried to change 
  node labels  and run distributed_shell with node label 
  specified.
  
   On Mon, Jun 29, 2015 at 10:30 AM, Ted Yu yuzhih...@gmail.com
 wrote:
  
   +1 (non-binding)
  
   Compiled hbase branch-1 with Java 1.8.0_45 Ran unit test suite 
   which passed.
  
   On Mon, Jun 29, 2015 at 7:22 AM, Steve Loughran 
  ste...@hortonworks.com
   wrote:
  
  
   +1 binding from me.
  
   Tests:
  
   Rebuild slider with Hadoop.version=2.7.1; ran all the tests
 including
   against a secure cluster.
   Repeated for windows running Java 8.
  
   All tests passed
  
  
   On 29 Jun 2015, at 09:45, Vinod Kumar Vavilapalli 
  vino...@apache.org
   wrote:
  
   Hi all,
  
   I've created a release candidate RC0 for Apache Hadoop 2.7.1.
  
   As discussed before, this is the next stable release to 
   follow up
   2.6.0,
   and the first stable one in the 2.7.x line.
  
   The RC is available for validation at:
   *http://people.apache.org/~vinodkv/hadoop-2.7.1-RC0/
   http://people.apache.org/~vinodkv/hadoop-2.7.1-RC0/*
  
   The RC tag in git is: release-2.7.1-RC0
  
   The maven artifacts are available via repository.apache.org 
   at
   *
  
  
  https://repository.apache.org/content/repositories/orgapachehadoop-
  101
  9/
   
  
  
  https://repository.apache.org/content/repositories/orgapachehadoop-
  101
  9/
   *
  
   Please try the release and vote; the vote will run for the 
   usual
 5
   days.
  
   Thanks,
   Vinod
  
   PS: It took 2 months instead of the planned [1] 2 weeks in
 getting
  this
   release out: post-mortem in a separate thread.
  
   [1]: A 2.7.1 release to follow up 2.7.0 
   http://markmail.org/thread/zwzze6cqqgwq4rmw
  
  
  
  
  
  
   --
   Lei (Eddy) Xu
   Software Engineer, Cloudera
  
  
  
 
 
 
 
 --
 
 
 Thanks
 Devaraj K




Re: [VOTE] Release Apache Hadoop 2.7.1 RC0

2015-07-01 Thread Arpit Agarwal
Vinod, thanks for putting together this release.



+1 (binding)

 - Verified signatures
 - Installed binary release on Centos 6 pseudo cluster
* Copied files in and out of HDFS using the shell
* Mounted HDFS via NFS and copied a 10GB file in and out over NFS
* Ran example MapReduce jobs
 - Deployed pseudo cluster from sources on Centos 6, verified
   native bits
 - Deployed pseudo cluster from sources on Windows 2008 R2, verified 
   native bits and ran example MR jobs

Arpit


On 6/29/15, 1:45 AM, Vinod Kumar Vavilapalli vino...@apache.org wrote:

Hi all,

I've created a release candidate RC0 for Apache Hadoop 2.7.1.

As discussed before, this is the next stable release to follow up 2.6.0,
and the first stable one in the 2.7.x line.

The RC is available for validation at:
*http://people.apache.org/~vinodkv/hadoop-2.7.1-RC0/
http://people.apache.org/~vinodkv/hadoop-2.7.1-RC0/*

The RC tag in git is: release-2.7.1-RC0

The maven artifacts are available via repository.apache.org at
*https://repository.apache.org/content/repositories/orgapachehadoop-1019/
https://repository.apache.org/content/repositories/orgapachehadoop-1019/*

Please try the release and vote; the vote will run for the usual 5 days.

Thanks,
Vinod

PS: It took 2 months instead of the planned [1] 2 weeks in getting this
release out: post-mortem in a separate thread.

[1]: A 2.7.1 release to follow up 2.7.0
http://markmail.org/thread/zwzze6cqqgwq4rmw