Build failed in Jenkins: Hadoop-Common-0.23-Build #76

2011-11-25 Thread Apache Jenkins Server
See https://builds.apache.org/job/Hadoop-Common-0.23-Build/76/

--
[...truncated 7744 lines...]
[INFO] --- maven-jar-plugin:2.3.1:jar (default-jar) @ hadoop-dist ---
[WARNING] JAR will be empty - no content was marked for inclusion!
[INFO] Building jar: 
https://builds.apache.org/job/Hadoop-Common-0.23-Build/ws/trunk/hadoop-dist/target/hadoop-dist-0.23.1-SNAPSHOT.jar
[INFO] 
[INFO] --- maven-site-plugin:3.0:attach-descriptor (attach-descriptor) @ 
hadoop-dist ---
[INFO] 
[INFO] --- maven-install-plugin:2.3.1:install (default-install) @ hadoop-dist 
---
[INFO] Installing 
https://builds.apache.org/job/Hadoop-Common-0.23-Build/ws/trunk/hadoop-dist/target/hadoop-dist-0.23.1-SNAPSHOT.jar
 to 
/home/jenkins/.m2/repository/org/apache/hadoop/hadoop-dist/0.23.1-SNAPSHOT/hadoop-dist-0.23.1-SNAPSHOT.jar
[INFO] Installing 
https://builds.apache.org/job/Hadoop-Common-0.23-Build/ws/trunk/hadoop-dist/pom.xml
 to 
/home/jenkins/.m2/repository/org/apache/hadoop/hadoop-dist/0.23.1-SNAPSHOT/hadoop-dist-0.23.1-SNAPSHOT.pom
[INFO] 
[INFO] Reactor Summary:
[INFO] 
[INFO] Apache Hadoop Main  SUCCESS [3.244s]
[INFO] Apache Hadoop Project POM . SUCCESS [0.055s]
[INFO] Apache Hadoop Annotations . SUCCESS [1.758s]
[INFO] Apache Hadoop Project Dist POM  SUCCESS [0.197s]
[INFO] Apache Hadoop Assemblies .. SUCCESS [0.086s]
[INFO] Apache Hadoop Auth  SUCCESS [2.163s]
[INFO] Apache Hadoop Auth Examples ... SUCCESS [1.217s]
[INFO] Apache Hadoop Common .. SUCCESS [24.449s]
[INFO] Apache Hadoop Common Project .. SUCCESS [0.043s]
[INFO] Apache Hadoop HDFS  SUCCESS [19.634s]
[INFO] Apache Hadoop HDFS Project  SUCCESS [0.040s]
[INFO] hadoop-yarn ... SUCCESS [0.048s]
[INFO] hadoop-yarn-api ... SUCCESS [7.137s]
[INFO] hadoop-yarn-common  SUCCESS [9.139s]
[INFO] hadoop-yarn-server  SUCCESS [0.020s]
[INFO] hadoop-yarn-server-common . SUCCESS [3.113s]
[INFO] hadoop-yarn-server-nodemanager  SUCCESS [5.732s]
[INFO] hadoop-yarn-server-web-proxy .. SUCCESS [1.714s]
[INFO] hadoop-yarn-server-resourcemanager  SUCCESS [7.235s]
[INFO] hadoop-yarn-server-tests .. SUCCESS [0.863s]
[INFO] hadoop-mapreduce-client ... SUCCESS [0.035s]
[INFO] hadoop-mapreduce-client-core .. SUCCESS [11.044s]
[INFO] hadoop-yarn-applications .. SUCCESS [0.020s]
[INFO] hadoop-yarn-applications-distributedshell . SUCCESS [2.138s]
[INFO] hadoop-yarn-site .. SUCCESS [0.098s]
[INFO] hadoop-mapreduce-client-common  SUCCESS [6.119s]
[INFO] hadoop-mapreduce-client-shuffle ... SUCCESS [1.511s]
[INFO] hadoop-mapreduce-client-app ... SUCCESS [6.043s]
[INFO] hadoop-mapreduce-client-hs  SUCCESS [2.208s]
[INFO] hadoop-mapreduce-client-jobclient . SUCCESS [2.484s]
[INFO] Apache Hadoop MapReduce Examples .. SUCCESS [2.719s]
[INFO] hadoop-mapreduce .. SUCCESS [0.121s]
[INFO] Apache Hadoop MapReduce Streaming . SUCCESS [3.020s]
[INFO] Apache Hadoop Tools ... SUCCESS [0.037s]
[INFO] Apache Hadoop Distribution  SUCCESS [0.068s]
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 2:06.648s
[INFO] Finished at: Fri Nov 25 09:05:19 UTC 2011
[INFO] Final Memory: 148M/978M
[INFO] 
+ cd hadoop-common-project
+ /home/jenkins/tools/maven/latest/bin/mvn clean verify checkstyle:checkstyle 
findbugs:findbugs -DskipTests -Pdist -Dtar -Psrc -Pnative -Pdocs
[INFO] Scanning for projects...
[INFO] 
[INFO] Reactor Build Order:
[INFO] 
[INFO] Apache Hadoop Annotations
[INFO] Apache Hadoop Auth
[INFO] Apache Hadoop Auth Examples
[INFO] Apache Hadoop Common
[INFO] Apache Hadoop Common Project
[INFO] 
[INFO] 
[INFO] Building Apache Hadoop Annotations 0.23.1-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- 

[Update] RE: Blocks are getting corrupted under very high load

2011-11-25 Thread Uma Maheswara Rao G
___
From: Uma Maheswara Rao G
Sent: Thursday, November 24, 2011 7:51 AM
To: hdfs-...@hadoop.apache.org; common-dev@hadoop.apache.org
Subject: RE: Blocks are getting corrupted under very high load

We could replicate the issue with some test code ( with out hadoop). Issue 
looks to be same as you pointed.

Thanks Todd.

Finally we also started suspecting in that angle. Planned to take the file 
details before reboot and after reboot.
With the above analysis i can confirm, whether the same issue or not.

Logs before reboot can not get because that logs are loosing as well :) . this 
is also a proof.

One more thing to notice is that the difference between reboot time and last 
replica finalization is ~1hr in some cases.
Since the machine is rebooted due to kernal.hung_task_timeout_secs , in OS 
also that particular thread might not got the chance to sync the data.
 
same cause.

great one, HDFS-1539, I have merged all the bugs. Since this is an 
improvement, issue might not come to my list :( .

Also found some OS level configs to do the filesystem operations synchronously
dirsync
All directory updates within the filesystem should be done synchronously. 
  This affects the following system calls: creat, link, unlink, symlink, 
  mkdir, rmdir, mknod and rename.
We suspected mainly the rename operation lost after reboot. Since metafile , 
blockfile rename should happen when finalizing the block from 
BlocksBeingWritten to current. ( at least not considered blocksize).

After the test, we found major performance hit.

Anyway, thanks a lot for your great  valuable  time  with us here. After 
checking the above OS logs, i will have a run with HDFS-1539.

Also performance hit. Presently we are planning to tune the client app for the 
less threads to reduce the load on OS and also data xceiver count at DN ( 
currently count is 4096 as Hbase team suggests). Obviously the problem should 
be rectified.

Regards,
Uma


From: Todd Lipcon [t...@cloudera.com]
Sent: Thursday, November 24, 2011 5:07 AM
To: common-dev@hadoop.apache.org
Cc: hdfs-...@hadoop.apache.org
Subject: Re: Blocks are getting corrupted under very high load

On Wed, Nov 23, 2011 at 1:23 AM, Uma Maheswara Rao G
mahesw...@huawei.com wrote:
 Yes, Todd,  block after restart is small and  genstamp also lesser.
   Here complete machine reboot happend. The boards are configured like, if it 
 is not getting any CPU cycles  for 480secs, it will reboot himself.
  kernal.hung_task_timeout_secs = 480 sec.

So sounds like the following happened:
- while writing file, the pipeline got reduced down to 1 node due to
timeouts from the other two
- soon thereafter (before more replicas were made), that last replica
kernel-paniced without syncing the data
- on reboot, the filesystem lost some edits from its ext3 journal, and
the block got moved back into the RBW directly, with truncated data
- hdfs did the right thing - at least what the algorithms say it
should do, because it had gotten a commitment for a later replica

If you have a build which includes HDFS-1539, you could consider
setting dfs.datanode.synconclose to true, which would have prevented
this problem.

-Todd
--
Todd Lipcon
Software Engineer, Cloudera


[jira] [Created] (HADOOP-7862) Move the support for multiple protocols to lower layer so that Writable, PB and Avro can all use it

2011-11-25 Thread Sanjay Radia (Created) (JIRA)
Move the support for multiple protocols to lower layer so that Writable, PB and 
Avro can all use it
---

 Key: HADOOP-7862
 URL: https://issues.apache.org/jira/browse/HADOOP-7862
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Sanjay Radia
Assignee: Sanjay Radia




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira