Re: hadoop-2.4.1

2014-05-20 Thread Akira AJISAKA

Hi Arun,

I'd like to know when to release Hadoop 2.4.1.
It looks like all of the blockers have been resolved.

Thanks,
Akira

(2014/04/24 5:59), Arun C Murthy wrote:

Folks,

  Here is a handy short-cut to track 2.4.1: 
http://s.apache.org/hadoop-2.4.1-blockers

  I'm hoping we can get the majority of this in by end-of-week and have an RC 
for consideration.

  Committers - I appreciate if you could try treat review/commit of these as 
high-priority. Also, please feel free to add other *really* important fixes 
you'd like to see - let's also try be super cautious adding new content.

thanks,
Arun






Re: hadoop-2.4.1

2014-05-20 Thread Steve Loughran
I'd like to see YARN-2065 fixed -without that the AM restart feature
doesn't work. Or at least it works, but you can't create new containers


On 20 May 2014 07:40, Akira AJISAKA ajisa...@oss.nttdata.co.jp wrote:

 Hi Arun,

 I'd like to know when to release Hadoop 2.4.1.
 It looks like all of the blockers have been resolved.

 Thanks,
 Akira


 (2014/04/24 5:59), Arun C Murthy wrote:

 Folks,

   Here is a handy short-cut to track 2.4.1: http://s.apache.org/hadoop-2.
 4.1-blockers

   I'm hoping we can get the majority of this in by end-of-week and have
 an RC for consideration.

   Committers - I appreciate if you could try treat review/commit of these
 as high-priority. Also, please feel free to add other *really* important
 fixes you'd like to see - let's also try be super cautious adding new
 content.

 thanks,
 Arun





-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


[jira] [Created] (MAPREDUCE-5895) Temporary Index File can not be cleaned up because OutputStream doesn't close properly

2014-05-20 Thread Kousuke Saruta (JIRA)
Kousuke Saruta created MAPREDUCE-5895:
-

 Summary: Temporary Index File can not be cleaned up because 
OutputStream doesn't close properly
 Key: MAPREDUCE-5895
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5895
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 3.0.0
Reporter: Kousuke Saruta


In TaskLog.java, Temporary Index File is created by following code.

{code}
BufferedOutputStream bos =
  new BufferedOutputStream(
SecureIOUtils.createForWrite(tmpIndexFile, 0644));
DataOutputStream dos = new DataOutputStream(bos);
{code}

The code is surrounded by try-finally so if some Exception/ERROR is thrown 
between constructing bos and dos, temporary file is not cleaned up.
I met the situation that when a thread ran, OOM was thrown after bos created 
and temporary file is not cleaned up. At different time, another thread 
executed same logic and fail because of FileAlreadyExistsException.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Problems setting custom class for the Pluggable Sort in MapReduce Next Generation

2014-05-20 Thread Pedro Dusso
Hello,

I'm developing a custom map output buffer which uses replacement selection
instead of quicksort. It's available
herehttps://bitbucket.org/pmdusso/hadoop-replacement-selection-sort/overview.
It is based on the new pluggable interface from the JIRA number
2454https://issues.apache.org/jira/browse/MAPREDUCE-2454
.

I've been testing it in a single-node installation with success. I
configure the job during its creation like this:

*  conf.set(io.serializations,
io.serialization.WritableSerializationWithZeroEndingText);*
* conf.set(mapreduce.job.map.output.collector.class,
pluggable.MapOutputHeapWithMetadataHeap);*

I used to generate a runnable jar and run it normally as java -jar ...  But
now I would like to try it in a multinode cluster (which is working with
normal jobs). I remove this hardcoded configuration and start calling the
jar like:

*hadoop jar jars/wordCount.jar
-Dmapreduce.task.io.sort.mb=16
-Dmapreduce.job.map.output.collector.class=pluggable.*
*MapOutputHeapWithMetadataHeap**
-Dio.serializations=io.serialization.WritableSerializationWithZeroEndingText
/wordcount/words /wordcount/output/out*

But I can't get this to work. I keep getting a ClassNotFoundException:

Error: java.lang.RuntimeException: java.lang.RuntimeException:
java.lang.ClassNotFoundException: Class
pluggable.MapOutputHeapWithMetadataHeap.class not found
 at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1927)
at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:383)
 at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:80)
at
org.apache.hadoop.mapred.MapTask$NewOutputCollector.init(MapTask.java:675)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException:
Class pluggable.MapOutputHeapWithMetadataHeap.class not found
 at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1895)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1919)
 ... 10 more
Caused by: java.lang.ClassNotFoundException: Class
pluggable.MapOutputHeapWithMetadataHeap.class not found
at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1801)
 at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1893)
... 11 more

I have two projects: one for jobs like wordCount, grep, etc. and one where
I'm developing my custom output buffer (the one in the bitbucket linked
above). Because of this, I tried different jar configurations:

   - Project jobs having a *project* dependency in Eclipse. Export runnable
   jar with packaged required libraries and also copied as a folder
   - Project jobs adds a jar generate from custom output buffer project
   - Fat jar generated with mvn in project jobs.


All of those failed. I would appreciate any help, since it seems to have
very few information about this online. If I'm missing some important
information, please let know I will bring it.

Best regards,

Pedro Martins Dusso


Hadoop-Mapreduce-trunk - Build # 1780 - Still Failing

2014-05-20 Thread Apache Jenkins Server
See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1780/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 34442 lines...]
[INFO] Reactor Summary:
[INFO] 
[INFO] hadoop-mapreduce-client ... SUCCESS [2.542s]
[INFO] hadoop-mapreduce-client-core .. SUCCESS [51.979s]
[INFO] hadoop-mapreduce-client-common  SUCCESS [26.509s]
[INFO] hadoop-mapreduce-client-shuffle ... SUCCESS [4.293s]
[INFO] hadoop-mapreduce-client-app ... SUCCESS [7:04.678s]
[INFO] hadoop-mapreduce-client-hs  SUCCESS [4:42.015s]
[INFO] hadoop-mapreduce-client-jobclient . FAILURE 
[1:36:46.660s]
[INFO] hadoop-mapreduce-client-hs-plugins  SKIPPED
[INFO] Apache Hadoop MapReduce Examples .. SKIPPED
[INFO] hadoop-mapreduce .. SKIPPED
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 1:49:59.706s
[INFO] Finished at: Tue May 20 15:36:52 UTC 2014
[INFO] Final Memory: 29M/100M
[INFO] 
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:2.16:test (default-test) on 
project hadoop-mapreduce-client-jobclient: There was a timeout or other error 
in the fork - [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn goals -rf :hadoop-mapreduce-client-jobclient
Build step 'Execute shell' marked build as failure
[FINDBUGS] Skipping publisher since build result is FAILURE
Archiving artifacts
Updating HDFS-6362
Updating HDFS-6361
Updating YARN-2066
Updating HDFS-6287
Updating HDFS-4913
Updating HADOOP-10614
Updating HDFS-5683
Updating HADOOP-10612
Updating MAPREDUCE-5867
Updating HDFS-6250
Updating HADOOP-10489
Updating YARN-2053
Updating HADOOP-10401
Updating HDFS-6397
Updating HADOOP-10586
Updating MAPREDUCE-5861
Updating HDFS-2949
Updating HDFS-6406
Updating HADOOP-10609
Updating MAPREDUCE-5809
Updating HDFS-6345
Updating HDFS-6400
Updating HDFS-6326
Updating HDFS-6325
Updating HDFS-6402
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
No tests ran.

[jira] [Created] (MAPREDUCE-5896) Allow InputSplits to indicate which locations have the block cached in memory

2014-05-20 Thread Sandy Ryza (JIRA)
Sandy Ryza created MAPREDUCE-5896:
-

 Summary: Allow InputSplits to indicate which locations have the 
block cached in memory
 Key: MAPREDUCE-5896
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5896
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.4.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza






--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: hadoop-2.4.1

2014-05-20 Thread Arun Murthy
Akira,

 Waiting for one more issue, stay tuned.

thanks,
Arun


On Mon, May 19, 2014 at 11:40 PM, Akira AJISAKA
ajisa...@oss.nttdata.co.jpwrote:

 Hi Arun,

 I'd like to know when to release Hadoop 2.4.1.
 It looks like all of the blockers have been resolved.

 Thanks,
 Akira


 (2014/04/24 5:59), Arun C Murthy wrote:

 Folks,

   Here is a handy short-cut to track 2.4.1: http://s.apache.org/hadoop-2.
 4.1-blockers

   I'm hoping we can get the majority of this in by end-of-week and have
 an RC for consideration.

   Committers - I appreciate if you could try treat review/commit of these
 as high-priority. Also, please feel free to add other *really* important
 fixes you'd like to see - let's also try be super cautious adding new
 content.

 thanks,
 Arun






-- 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


[jira] [Created] (MAPREDUCE-5897) Provide a utility to be able inspect the config as seen by a hadoop client daemon

2014-05-20 Thread Gera Shegalov (JIRA)
Gera Shegalov created MAPREDUCE-5897:


 Summary: Provide a utility to be able inspect the config as seen 
by a hadoop client daemon 
 Key: MAPREDUCE-5897
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5897
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Gera Shegalov
Assignee: Gera Shegalov


To ease debugging of config issues it is convenient to be able to generate a 
config as seen by the job client or a hadoop daemon

{noformat}
]$ hadoop org.apache.hadoop.util.ConfigTool -help 
Usage: ConfigTool [ -xml | -json ] [ -loadDefaults ] [ resource1... ]
  if resource contains '/', load from local filesystem
  otherwise, load from the classpath

Generic options supported are
-conf configuration file specify an application configuration file
-D property=valueuse value for given property
-fs local|namenode:port  specify a namenode
-jt local|jobtracker:portspecify a job tracker
-files comma separated list of filesspecify comma separated files to be 
copied to the map reduce cluster
-libjars comma separated list of jarsspecify comma separated jar files to 
include in the classpath.
-archives comma separated list of archivesspecify comma separated 
archives to be unarchived on the compute machines.

The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]
{noformat}

{noformat}
$ hadoop org.apache.hadoop.util.ConfigTool -Dmy.test.conf=val mapred-site.xml 
./hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/etc/hadoop/core-site.xml | python 
-mjson.tool
{
properties: [
{
isFinal: false,
key: mapreduce.framework.name,
resource: mapred-site.xml,
value: yarn
},
{
isFinal: false,
key: mapreduce.client.genericoptionsparser.used,
resource: programatically,
value: true
},
{
isFinal: false,
key: my.test.conf,
resource: from command line,
value: val
},
{
isFinal: false,
key: from.file.key,
resource: 
hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/etc/hadoop/core-site.xml,
value: from.file.val
},
{
isFinal: false,
key: mapreduce.shuffle.port,
resource: mapred-site.xml,
value: ${my.mapreduce.shuffle.port}
}
]
}
{noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)