[jira] [Created] (MAPREDUCE-5278) Perf: Distributed cache is broken when JT staging dir is not on the default FS

2013-05-28 Thread Xi Fang (JIRA)
Xi Fang created MAPREDUCE-5278:
--

 Summary: Perf: Distributed cache is broken when JT staging dir is 
not on the default FS
 Key: MAPREDUCE-5278
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5278
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: distributed-cache
Affects Versions: 1-win
 Environment: Windows
Reporter: Xi Fang


Today, we set the JobTracker staging dir 
(mapreduce.jobtracker.staging.root.dir) to point to HDFS even though ASV is 
the default file system. There are a few reason why this config was chosen:
To prevent leak of the storage account creds to the user's storage account 
(IOW, keep job.xml in the cluster). This is needed until HADOOP-444 is fixed.
It uses HDFS for the transient job files what is good for two reasons – a) it 
does not flood the user's storage account with irrelevant data/files b) it 
leverages HDFS locality for small files
However, this approach conflicts with how distributed cache caching works, 
completely negating the feature's functionality.
When files are added to the distributed cache (thru files/achieves/libjars 
hadoop generic options), they are copied to the job tracker staging dir only if 
they reside on a file system different that the jobtracker's. Later on, this 
path is used as a key to cache the files locally on the tasktracker's 
machine, and avoid localization (download/unzip) of the distributed cache files 
if they are already localized.
In our configuration the caching is completely disabled and we always end up 
copying dist cache files to the JT staging dir first and localizing them on the 
tasktracker machine second.
This is especially not good for Oozie scenarios as Oozie uses dist cache to 
populate Hive/Pig jars throughout the cluster.
Easy workaround is to config mapreduce.jobtracker.staging.root.dir in 
mapred-site.xml to be on the default FS.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-5279) mapreduce scheduling deadlock

2013-05-28 Thread PengZhang (JIRA)
PengZhang created MAPREDUCE-5279:


 Summary: mapreduce scheduling deadlock
 Key: MAPREDUCE-5279
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5279
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, scheduler
Affects Versions: 2.0.3-alpha
Reporter: PengZhang


YARN-2 imported cpu dimension scheduling, but MR RMContainerAllocator doesn't 
take into account virtual cores while scheduling reduce tasks.
This may cause more reduce tasks to be scheduled because memory is enough. And 
on a small cluster, this will end with deadlock, all running containers are 
reduce tasks but map phase is not finished. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [VOTE] Plan to create release candidate for 0.23.8

2013-05-28 Thread Thomas Graves
The vote passed with 15 +1's (9 binding) and 0 -1's.  I will start the
release today.

Thanks,
Tom

On 5/17/13 4:10 PM, Thomas Graves tgra...@yahoo-inc.com wrote:

Hello all,

We've had a few critical issues come up in 0.23.7 that I think warrants a
0.23.8 release. The main one is MAPREDUCE-5211.  There are a couple of
other issues that I want finished up and get in before we spin it.  Those
include HDFS-3875, HDFS-4805, and HDFS-4835.  I think those are on track
to finish up early next week.   So I hope to spin 0.23.8 soon after this
vote completes.

Please vote '+1' to approve this plan. Voting will close on Friday May
24th at 2:00pm PDT.

Thanks,
Tom Graves




[VOTE] Release Apache Hadoop 0.23.8

2013-05-28 Thread Thomas Graves

I've created a release candidate (RC0) for hadoop-0.23.8 that I would like
to release.

This release is a sustaining release with several important bug fixes in
it.  The most critical one is MAPREDUCE-5211.

The RC is available at:
http://people.apache.org/~tgraves/hadoop-0.23.8-candidate-0/
The RC tag in svn is here:
http://svn.apache.org/viewvc/hadoop/common/tags/release-0.23.8-rc0/

The maven artifacts are available via repository.apache.org.

Please try the release and vote; the vote will run for the usual 7 days.

I am +1 (binding).

thanks,
Tom Graves



Re: [VOTE] Release Apache Hadoop 2.0.4.1-alpha

2013-05-28 Thread Alejandro Abdelnur
+1, verified MD5 and signature. Did a full build, started pseudo cluster,
run a few MR jobs, verified httpfs works.

Thanks.


On Sat, May 25, 2013 at 10:01 AM, Sangjin Lee sj...@apache.org wrote:

 +1 (non-binding)

 Thanks,
 Sangjin


 On Fri, May 24, 2013 at 8:48 PM, Konstantin Boudnik c...@apache.org
 wrote:

  All,
 
  I have created a release candidate (rc0) for hadoop-2.0.4.1-alpha that I
  would
  like to release.
 
  This is a stabilization release that includes fixed for a couple a of
  issues
  discovered in the testing with BigTop 0.6.0 release candidate.
 
  The RC is available at:
  http://people.apache.org/~cos/hadoop-2.0.4.1-alpha-rc0/
  The RC tag in svn is here:
 
 http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.0.4.1-alpha-rc0
 
  The maven artifacts are available via repository.apache.org.
 
  Please try the release bits and vote; the vote will run for the usual 7
  days.
 
  Thanks for your voting
Cos
 
 




-- 
Alejandro


Re: [VOTE] Release Apache Hadoop 0.23.8

2013-05-28 Thread Alejandro Abdelnur
+1, verified MD5 and signature. Did a full build, started pseudo cluster,
run a few MR jobs, verified httpfs works.

Thanks.


On Tue, May 28, 2013 at 9:00 AM, Thomas Graves tgra...@yahoo-inc.comwrote:


 I've created a release candidate (RC0) for hadoop-0.23.8 that I would like
 to release.

 This release is a sustaining release with several important bug fixes in
 it.  The most critical one is MAPREDUCE-5211.

 The RC is available at:
 http://people.apache.org/~tgraves/hadoop-0.23.8-candidate-0/
 The RC tag in svn is here:
 http://svn.apache.org/viewvc/hadoop/common/tags/release-0.23.8-rc0/

 The maven artifacts are available via repository.apache.org.

 Please try the release and vote; the vote will run for the usual 7 days.

 I am +1 (binding).

 thanks,
 Tom Graves




-- 
Alejandro


Deadline Extension: 2013 Workshop on Middleware for HPC and Big Data Systems (MHPC'13)

2013-05-28 Thread MHPC 2013
we apologize if you receive multiple copies of this message

===

CALL FOR PAPERS

2013 Workshop on

Middleware for HPC and Big Data Systems

MHPC '13

as part of Euro-Par 2013, Aachen, Germany

===

Date: August 27, 2012

Workshop URL: http://m-hpc.org

Springer LNCS

SUBMISSION DEADLINE:

June 10, 2013 - LNCS Full paper submission (extended)
June 28, 2013 - Lightning Talk abstracts


SCOPE

Extremely large, diverse, and complex data sets are generated from
scientific applications, the Internet, social media and other applications.
Data may be physically distributed and shared by an ever larger community.
Collecting, aggregating, storing and analyzing large data volumes
presents major challenges. Processing such amounts of data efficiently
has been an issue to scientific discovery and technological
advancement. In addition, making the data accessible, understandable and
interoperable includes unsolved problems. Novel middleware architectures,
algorithms, and application development frameworks are required.

In this workshop we are particularly interested in original work at the
intersection of HPC and Big Data with regard to middleware handling
and optimizations. Scope is existing and proposed middleware for HPC
and big data, including analytics libraries and frameworks.

The goal of this workshop is to bring together software architects,
middleware and framework developers, data-intensive application developers
as well as users from the scientific and engineering community to exchange
their experience in processing large datasets and to report their scientific
achievement and innovative ideas. The workshop also offers a dedicated forum
for these researchers to access the state of the art, to discuss problems
and requirements, to identify gaps in current and planned designs, and to
collaborate in strategies for scalable data-intensive computing.

The workshop will be one day in length, composed of 20 min paper
presentations, each followed by 10 min discussion sections.
Presentations may be accompanied by interactive demonstrations.


TOPICS

Topics of interest include, but are not limited to:

- Middleware including: Hadoop, Apache Drill, YARN, Spark/Shark, Hive,
Pig, Sqoop,
HBase, HDFS, S4, CIEL, Oozie, Impala, Storm and Hyrack
- Data intensive middleware architecture
- Libraries/Frameworks including: Apache Mahout, Giraph, UIMA and GraphLab
- NG Databases including Apache Cassandra, MongoDB and CouchDB/Couchbase
- Schedulers including Cascading
- Middleware for optimized data locality/in-place data processing
- Data handling middleware for deployment in virtualized HPC environments
- Parallelization and distributed processing architectures at the
middleware level
- Integration with cloud middleware and application servers
- Runtime environments and system level support for data-intensive computing
- Skeletons and patterns
- Checkpointing
- Programming models and languages
- Big Data ETL
- Stream processing middleware
- In-memory databases for HPC
- Scalability and interoperability
- Large-scale data storage and distributed file systems
- Content-centric addressing and networking
- Execution engines, languages and environments including CIEL/Skywriting
- Performance analysis, evaluation of data-intensive middleware
- In-depth analysis and performance optimizations in existing data-handling
middleware, focusing on indexing/fast storing or retrieval between compute
and storage nodes
- Highly scalable middleware optimized for minimum communication
- Use cases and experience for popular Big Data middleware
- Middleware security, privacy and trust architectures

DATES

Papers:
Rolling abstract submission
June 10, 2013 - Full paper submission (extended)
July 8, 2013 - Acceptance notification
October 3, 2013 - Camera-ready version due

Lightning Talks:
June 28, 2013 - Deadline for lightning talk abstracts
July 15, 2013 - Lightning talk notification

August 27, 2013 - Workshop Date


TPC

CHAIR

Michael Alexander (chair), TU Wien, Austria
Anastassios Nanos (co-chair), NTUA, Greece
Jie Tao (co-chair), Karlsruhe Institut of Technology, Germany
Lizhe Wang (co-chair), Chinese Academy of Sciences, China
Gianluigi Zanetti (co-chair), CRS4, Italy

PROGRAM COMMITTEE

Amitanand Aiyer, Facebook, USA
Costas Bekas, IBM, Switzerland
Jakob Blomer, CERN, Switzerland
William Gardner, University of Guelph, Canada
José Gracia, HPC Center of the University of Stuttgart, Germany
Zhenghua Guom,  Indiana University, USA
Marcus Hardt,  Karlsruhe Institute of Technology, Germany
Sverre Jarp, CERN, Switzerland
Christopher Jung,  Karlsruhe Institute of Technology, Germany
Andreas Knüpfer - Technische Universität Dresden, Germany
Nectarios Koziris, National Technical University of Athens, Greece
Yan Ma, Chinese Academy of Sciences, China
Martin Schulz - Lawrence Livermore National Laboratory

Re: [VOTE] Release Apache Hadoop 2.0.4.1-alpha

2013-05-28 Thread Chris Douglas
+1

Checksum and signature match, ran some unit tests, verified w/ a diff
of release-2.0.4-alpha that the release contains MAPREDUCE-5240 and
HADOOP-9407, plus some fixups to the release notes. -C

On Fri, May 24, 2013 at 8:48 PM, Konstantin Boudnik c...@apache.org wrote:
 All,

 I have created a release candidate (rc0) for hadoop-2.0.4.1-alpha that I would
 like to release.

 This is a stabilization release that includes fixed for a couple a of issues
 discovered in the testing with BigTop 0.6.0 release candidate.

 The RC is available at: 
 http://people.apache.org/~cos/hadoop-2.0.4.1-alpha-rc0/
 The RC tag in svn is here: 
 http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.0.4.1-alpha-rc0

 The maven artifacts are available via repository.apache.org.

 Please try the release bits and vote; the vote will run for the usual 7 days.

 Thanks for your voting
   Cos



5 second minimum shuffle time

2013-05-28 Thread Kay Ousterhout
Hi,

I'm running v0.23 in a large cluster, and have found that the shuffle time
for reduce tasks is always at least 5 seconds, even when the amount of data
read by the reduce task is tiny (e.g., just 18 bytes).  This shuffle time
floor suggests that there's a heartbeat interval or something that has to
elapse before the shuffle begins, but I can't find any sign of such a delay
in the code base.  Can anyone shed some light on why this is occurring?

Thanks,
Kay


[jira] [Reopened] (MAPREDUCE-5036) Default shuffle handler port should not be 8080

2013-05-28 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza reopened MAPREDUCE-5036:
---


 Default shuffle handler port should not be 8080
 ---

 Key: MAPREDUCE-5036
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5036
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 2.0.5-beta

 Attachments: MAPREDUCE-5036-13562.patch, MAPREDUCE-5036.patch


 The shuffle handler port (mapreduce.shuffle.port) defaults to 8080.  This is 
 a pretty common port for web services, and is likely to cause unnecessary 
 port conflicts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira