Build failed in Jenkins: Hadoop-Common-0.23-Build #76
See https://builds.apache.org/job/Hadoop-Common-0.23-Build/76/ -- [...truncated 7744 lines...] [INFO] --- maven-jar-plugin:2.3.1:jar (default-jar) @ hadoop-dist --- [WARNING] JAR will be empty - no content was marked for inclusion! [INFO] Building jar: https://builds.apache.org/job/Hadoop-Common-0.23-Build/ws/trunk/hadoop-dist/target/hadoop-dist-0.23.1-SNAPSHOT.jar [INFO] [INFO] --- maven-site-plugin:3.0:attach-descriptor (attach-descriptor) @ hadoop-dist --- [INFO] [INFO] --- maven-install-plugin:2.3.1:install (default-install) @ hadoop-dist --- [INFO] Installing https://builds.apache.org/job/Hadoop-Common-0.23-Build/ws/trunk/hadoop-dist/target/hadoop-dist-0.23.1-SNAPSHOT.jar to /home/jenkins/.m2/repository/org/apache/hadoop/hadoop-dist/0.23.1-SNAPSHOT/hadoop-dist-0.23.1-SNAPSHOT.jar [INFO] Installing https://builds.apache.org/job/Hadoop-Common-0.23-Build/ws/trunk/hadoop-dist/pom.xml to /home/jenkins/.m2/repository/org/apache/hadoop/hadoop-dist/0.23.1-SNAPSHOT/hadoop-dist-0.23.1-SNAPSHOT.pom [INFO] [INFO] Reactor Summary: [INFO] [INFO] Apache Hadoop Main SUCCESS [3.244s] [INFO] Apache Hadoop Project POM . SUCCESS [0.055s] [INFO] Apache Hadoop Annotations . SUCCESS [1.758s] [INFO] Apache Hadoop Project Dist POM SUCCESS [0.197s] [INFO] Apache Hadoop Assemblies .. SUCCESS [0.086s] [INFO] Apache Hadoop Auth SUCCESS [2.163s] [INFO] Apache Hadoop Auth Examples ... SUCCESS [1.217s] [INFO] Apache Hadoop Common .. SUCCESS [24.449s] [INFO] Apache Hadoop Common Project .. SUCCESS [0.043s] [INFO] Apache Hadoop HDFS SUCCESS [19.634s] [INFO] Apache Hadoop HDFS Project SUCCESS [0.040s] [INFO] hadoop-yarn ... SUCCESS [0.048s] [INFO] hadoop-yarn-api ... SUCCESS [7.137s] [INFO] hadoop-yarn-common SUCCESS [9.139s] [INFO] hadoop-yarn-server SUCCESS [0.020s] [INFO] hadoop-yarn-server-common . SUCCESS [3.113s] [INFO] hadoop-yarn-server-nodemanager SUCCESS [5.732s] [INFO] hadoop-yarn-server-web-proxy .. SUCCESS [1.714s] [INFO] hadoop-yarn-server-resourcemanager SUCCESS [7.235s] [INFO] hadoop-yarn-server-tests .. SUCCESS [0.863s] [INFO] hadoop-mapreduce-client ... SUCCESS [0.035s] [INFO] hadoop-mapreduce-client-core .. SUCCESS [11.044s] [INFO] hadoop-yarn-applications .. SUCCESS [0.020s] [INFO] hadoop-yarn-applications-distributedshell . SUCCESS [2.138s] [INFO] hadoop-yarn-site .. SUCCESS [0.098s] [INFO] hadoop-mapreduce-client-common SUCCESS [6.119s] [INFO] hadoop-mapreduce-client-shuffle ... SUCCESS [1.511s] [INFO] hadoop-mapreduce-client-app ... SUCCESS [6.043s] [INFO] hadoop-mapreduce-client-hs SUCCESS [2.208s] [INFO] hadoop-mapreduce-client-jobclient . SUCCESS [2.484s] [INFO] Apache Hadoop MapReduce Examples .. SUCCESS [2.719s] [INFO] hadoop-mapreduce .. SUCCESS [0.121s] [INFO] Apache Hadoop MapReduce Streaming . SUCCESS [3.020s] [INFO] Apache Hadoop Tools ... SUCCESS [0.037s] [INFO] Apache Hadoop Distribution SUCCESS [0.068s] [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 2:06.648s [INFO] Finished at: Fri Nov 25 09:05:19 UTC 2011 [INFO] Final Memory: 148M/978M [INFO] + cd hadoop-common-project + /home/jenkins/tools/maven/latest/bin/mvn clean verify checkstyle:checkstyle findbugs:findbugs -DskipTests -Pdist -Dtar -Psrc -Pnative -Pdocs [INFO] Scanning for projects... [INFO] [INFO] Reactor Build Order: [INFO] [INFO] Apache Hadoop Annotations [INFO] Apache Hadoop Auth [INFO] Apache Hadoop Auth Examples [INFO] Apache Hadoop Common [INFO] Apache Hadoop Common Project [INFO] [INFO] [INFO] Building Apache Hadoop Annotations 0.23.1-SNAPSHOT [INFO] [INFO] [INFO] ---
[Update] RE: Blocks are getting corrupted under very high load
___ From: Uma Maheswara Rao G Sent: Thursday, November 24, 2011 7:51 AM To: hdfs-...@hadoop.apache.org; common-dev@hadoop.apache.org Subject: RE: Blocks are getting corrupted under very high load We could replicate the issue with some test code ( with out hadoop). Issue looks to be same as you pointed. Thanks Todd. Finally we also started suspecting in that angle. Planned to take the file details before reboot and after reboot. With the above analysis i can confirm, whether the same issue or not. Logs before reboot can not get because that logs are loosing as well :) . this is also a proof. One more thing to notice is that the difference between reboot time and last replica finalization is ~1hr in some cases. Since the machine is rebooted due to kernal.hung_task_timeout_secs , in OS also that particular thread might not got the chance to sync the data. same cause. great one, HDFS-1539, I have merged all the bugs. Since this is an improvement, issue might not come to my list :( . Also found some OS level configs to do the filesystem operations synchronously dirsync All directory updates within the filesystem should be done synchronously. This affects the following system calls: creat, link, unlink, symlink, mkdir, rmdir, mknod and rename. We suspected mainly the rename operation lost after reboot. Since metafile , blockfile rename should happen when finalizing the block from BlocksBeingWritten to current. ( at least not considered blocksize). After the test, we found major performance hit. Anyway, thanks a lot for your great valuable time with us here. After checking the above OS logs, i will have a run with HDFS-1539. Also performance hit. Presently we are planning to tune the client app for the less threads to reduce the load on OS and also data xceiver count at DN ( currently count is 4096 as Hbase team suggests). Obviously the problem should be rectified. Regards, Uma From: Todd Lipcon [t...@cloudera.com] Sent: Thursday, November 24, 2011 5:07 AM To: common-dev@hadoop.apache.org Cc: hdfs-...@hadoop.apache.org Subject: Re: Blocks are getting corrupted under very high load On Wed, Nov 23, 2011 at 1:23 AM, Uma Maheswara Rao G mahesw...@huawei.com wrote: Yes, Todd, block after restart is small and genstamp also lesser. Here complete machine reboot happend. The boards are configured like, if it is not getting any CPU cycles for 480secs, it will reboot himself. kernal.hung_task_timeout_secs = 480 sec. So sounds like the following happened: - while writing file, the pipeline got reduced down to 1 node due to timeouts from the other two - soon thereafter (before more replicas were made), that last replica kernel-paniced without syncing the data - on reboot, the filesystem lost some edits from its ext3 journal, and the block got moved back into the RBW directly, with truncated data - hdfs did the right thing - at least what the algorithms say it should do, because it had gotten a commitment for a later replica If you have a build which includes HDFS-1539, you could consider setting dfs.datanode.synconclose to true, which would have prevented this problem. -Todd -- Todd Lipcon Software Engineer, Cloudera
[jira] [Created] (HADOOP-7862) Move the support for multiple protocols to lower layer so that Writable, PB and Avro can all use it
Move the support for multiple protocols to lower layer so that Writable, PB and Avro can all use it --- Key: HADOOP-7862 URL: https://issues.apache.org/jira/browse/HADOOP-7862 Project: Hadoop Common Issue Type: Sub-task Reporter: Sanjay Radia Assignee: Sanjay Radia -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira