[jira] Created: (MAPREDUCE-1780) AccessControlList.toString() is used for serialization of ACL in JobStatus.java
AccessControlList.toString() is used for serialization of ACL in JobStatus.java --- Key: MAPREDUCE-1780 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1780 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Ravi Gummadi HADOOP-6715 is created to fix AccessControlList.toString() for the case of WILDCARD. JobStatus.write() and readFields() assume that toString() returns the serialized String of AccessControlList object, which is not true. Once HADOOP-6715 gets fixed in COMMON, JobStatus.write() and JobStatus.readFields() should be fixed depending on the fix of HADOOP-6715. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Minutes: Hadoop Contributor Meeting 05/06/2010
Not sure why my attachment didn't make it to the list. Anyway, I've posted Arun's notes on the wiki at http://wiki.apache.org/hadoop/HadoopContributorsMeeting20100506, and included the content of my slide there. (Attachments on the wiki have been disabled - as of today apparently, see SVN commit r775220 - so I wasn't able to post the slide there either.) Tom On Fri, May 7, 2010 at 9:36 AM, Tom White t...@cloudera.com wrote: Here's my (single) slide about the 0.21 release. Tom On Thu, May 6, 2010 at 5:38 PM, Arun C Murthy acmur...@gmail.com wrote: # Shared goals - Hadoop is HDFS Map-Reduce in this context of this set of slides # Priorities * Yahoo - Correctness - Availability: Not the same as high-availability (6 9s. etc.) i.e. SPOFs - API Compatibility - Scalability - Operability - Performance - Innovation * Cloudera - Test coverage, api coverage - APL Licensed codec (lzo replacement) - Security - Wire compatibility - Cluster-wide resource availability - New apis (FileContext, MR Context Objs.), documentation of their advantages - HDFS to better support non-MR use-cases - Cluster metrics hooks - MR modularity (package) * Facebook - Correctness - Availability, High Availability, Failover, Continuous Availability - Scalability # Bar for patches/features keeps going higher as the project matures - Build consensus (e.g. Python Enhancement Process, JSR etc.) - Run/test on your own to prove the concept/feature or branch and finish - Early versions of libraries should be started outside of the project (github etc.) e.g. input-formats, compression-codecs etc. - github for all the above - Prune contrib # Maven for packaging # Tom: hadoop-0.21 (Tom - can you please post your slides? Thanks!) # Owen: Release Manager (see slides) # Agenda for next meeting - Eli: Hadoop Enhancement Process (modelled on PEP?) - Branching strategies: Development Models Arun
[jira] Created: (MAPREDUCE-1781) option -D mapred.tasktracker.map.tasks.maximum=1 does not work when no of mappers is bigger than no of nodes - always spawns 2 mapers/node
option -D mapred.tasktracker.map.tasks.maximum=1 does not work when no of mappers is bigger than no of nodes - always spawns 2 mapers/node Key: MAPREDUCE-1781 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1781 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/streaming Affects Versions: 0.20.2 Environment: Debian Lenny x64, and Hadoop 0.20.2, 2GB RAM Reporter: Tudor Vlad Hello I am a new user of Hadoop and I have some trouble using Hadoop Streaming and the -D mapred.tasktracker.map.tasks.maximum option. I'm experimenting with an unmanaged application (C++) which I want to run over several nodes in 2 scenarios 1) the number of maps (input splits) is equal to the number of nodes 2) the number of maps is a multiple of the number of nodes (5, 10, 20, ... Initially, when running the tests in scenario 1 I would sometimes get 2 process/node on half the nodes. However I fixed this by adding the optin -D mapred.tasktracker.map.tasks.maximum=1, so everything works fine. In the case of scenario 2 (more maps than nodes) this directive no longer works, always obtaining 2 processes/node. I tested the even with putting maximum=5 and I still get 2 processes/node. The entire command I use is: /usr/bin/time --format=-duration:\t%e |\t-MFaults:\t%F |\t-ContxtSwitch:\t%w \ /opt/hadoop/bin/hadoop jar /opt/hadoop/contrib/streaming/hadoop-0.20.2-streaming.jar \ -D mapred.tasktracker.map.tasks.maximum=1 \ -D mapred.map.tasks=30 \ -D mapred.reduce.tasks=0 \ -D io.file.buffer.size=5242880 \ -libjars /opt/hadoop/contrib/streaming/hadoop-7debug.jar \ -input input/test \ -output out1 \ -mapper /opt/jobdata/script_1k \ -inputformat me.MyInputFormat Why is this happening and how can I make it work properly (i.e. be able to limit exactly how many mappers I can have at 1 time per node)? Thank you in advance -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1782) GlobPath support for har
GlobPath support for har Key: MAPREDUCE-1782 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1782 Project: Hadoop Map/Reduce Issue Type: Bug Components: harchive Affects Versions: 0.20.1 Reporter: Santhosh Srinivasan When a fully qualified path for a har file is used, the FileSystem.globStatus() returns null. Please see the attached test case. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.