[jira] Created: (MAPREDUCE-1780) AccessControlList.toString() is used for serialization of ACL in JobStatus.java

2010-05-10 Thread Ravi Gummadi (JIRA)
AccessControlList.toString() is used for serialization of ACL in JobStatus.java
---

 Key: MAPREDUCE-1780
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1780
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Ravi Gummadi


HADOOP-6715 is created to fix AccessControlList.toString() for the case of 
WILDCARD. JobStatus.write() and readFields() assume that toString() returns the 
serialized String of AccessControlList object, which is not true. Once 
HADOOP-6715 gets fixed in COMMON, JobStatus.write() and JobStatus.readFields() 
should be fixed depending on the fix of HADOOP-6715.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Minutes: Hadoop Contributor Meeting 05/06/2010

2010-05-10 Thread Tom White
Not sure why my attachment didn't make it to the list. Anyway, I've
posted Arun's notes on the wiki at
http://wiki.apache.org/hadoop/HadoopContributorsMeeting20100506, and
included the content of my slide there. (Attachments on the wiki have
been disabled - as of today apparently, see SVN commit r775220 - so I
wasn't able to post the slide there either.)

Tom

On Fri, May 7, 2010 at 9:36 AM, Tom White t...@cloudera.com wrote:
 Here's my (single) slide about the 0.21 release.

 Tom

 On Thu, May 6, 2010 at 5:38 PM, Arun C Murthy acmur...@gmail.com wrote:
 # Shared goals
  - Hadoop is HDFS  Map-Reduce in this context of this set of slides
 # Priorities
  * Yahoo
    - Correctness
    - Availability: Not the same as high-availability (6 9s. etc.) i.e. SPOFs
    - API Compatibility
    - Scalability
    - Operability
    - Performance
    - Innovation
  * Cloudera
    - Test coverage, api coverage
    - APL Licensed codec (lzo replacement)
    - Security
    - Wire compatibility
    - Cluster-wide resource availability
    - New apis (FileContext, MR Context Objs.), documentation of their
 advantages
    - HDFS to better support non-MR use-cases
    - Cluster metrics hooks
    - MR modularity (package)
  * Facebook
    - Correctness
    - Availability, High Availability, Failover, Continuous Availability
    - Scalability
 # Bar for patches/features keeps going higher as the project matures
  - Build consensus (e.g. Python Enhancement Process, JSR etc.)
  - Run/test on your own to prove the concept/feature or branch and finish
  - Early versions of libraries should be started outside of the project
 (github etc.) e.g. input-formats, compression-codecs etc.
    - github for all the above
    - Prune contrib
 # Maven for packaging
 # Tom: hadoop-0.21 (Tom - can you please post your slides? Thanks!)
 # Owen: Release Manager (see slides)
 # Agenda for next meeting
  - Eli: Hadoop Enhancement Process (modelled on PEP?)
  - Branching strategies: Development Models

 Arun








[jira] Created: (MAPREDUCE-1781) option -D mapred.tasktracker.map.tasks.maximum=1 does not work when no of mappers is bigger than no of nodes - always spawns 2 mapers/node

2010-05-10 Thread Tudor Vlad (JIRA)
option -D mapred.tasktracker.map.tasks.maximum=1 does not work when no of 
mappers is bigger than no of nodes - always spawns 2 mapers/node


 Key: MAPREDUCE-1781
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1781
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/streaming
Affects Versions: 0.20.2
 Environment: Debian Lenny x64, and Hadoop 0.20.2, 2GB RAM
Reporter: Tudor Vlad


Hello

I am a new user of Hadoop and I have some trouble using Hadoop Streaming and 
the -D mapred.tasktracker.map.tasks.maximum option. 

I'm experimenting with an unmanaged application (C++) which I want to run over 
several nodes in 2 scenarios
1) the number of maps (input splits) is equal to the number of nodes
2) the number of maps is a multiple of the number of nodes (5, 10, 20, ...

Initially, when running the tests in scenario 1 I would sometimes get 2 
process/node on half the nodes. However I fixed this by adding the optin -D 
mapred.tasktracker.map.tasks.maximum=1, so everything works fine.

In the case of scenario 2 (more maps than nodes) this directive no longer 
works, always obtaining 2 processes/node. I tested the even with putting 
maximum=5 and I still get 2 processes/node.

The entire command I use is:

/usr/bin/time --format=-duration:\t%e |\t-MFaults:\t%F |\t-ContxtSwitch:\t%w \
 /opt/hadoop/bin/hadoop jar 
/opt/hadoop/contrib/streaming/hadoop-0.20.2-streaming.jar \
 -D mapred.tasktracker.map.tasks.maximum=1 \
 -D mapred.map.tasks=30 \
 -D mapred.reduce.tasks=0 \
 -D io.file.buffer.size=5242880 \
 -libjars /opt/hadoop/contrib/streaming/hadoop-7debug.jar \
 -input input/test \
 -output out1 \
 -mapper /opt/jobdata/script_1k \
 -inputformat me.MyInputFormat

Why is this happening and how can I make it work properly (i.e. be able to 
limit exactly how many mappers I can have at 1 time per node)?

Thank you in advance

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1782) GlobPath support for har

2010-05-10 Thread Santhosh Srinivasan (JIRA)
GlobPath support for har


 Key: MAPREDUCE-1782
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1782
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: harchive
Affects Versions: 0.20.1
Reporter: Santhosh Srinivasan


When a fully qualified path for a har file is used, the FileSystem.globStatus() 
returns null. Please see the attached test case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.