Build failed in Jenkins: Hadoop-Common-trunk #1338
See https://builds.apache.org/job/Hadoop-Common-trunk/1338/changes Changes: [wheat9] HADOOP-10482. Fix various findbugs warnings in hadoop-common. Contributed by Haohui Mai. [wheat9] HADOOP-11388. Remove deprecated o.a.h.metrics.file.FileContext. Contributed by Li Lu. [aw] HADOOP-10950. rework heap management vars (John Smith via aw) [aw] HADOOP-6590. Add a username check for hadoop sub-commands (John Smith via aw) [aw] YARN-2437. start-yarn.sh/stop-yarn should give info (Varun Saxena via aw) [wheat9] HADOOP-11386. Replace \n by %n in format hadoop-common format strings. Contributed by Li Lu. [wheat9] HDFS-5578. [JDK8] Fix Javadoc errors caused by incorrect or illegal tags in doc comments. Contributed by Andrew Purtell. [arp] HDFS-7475. Make TestLazyPersistFiles#testLazyPersistBlocksAreSaved deterministic. (Contributed by Xiaoyu Yao) [harsh] MAPREDUCE-5420. Remove mapreduce.task.tmp.dir from mapred-default.xml. Contributed by James Carman. (harsh) [wheat9] HDFS-7463. Simplify FSNamesystem#getBlockLocationsUpdateTimes. Contributed by Haohui Mai. [arp] HDFS-7503. Namenode restart after large deletions can cause slow processReport (Arpit Agarwal) -- [...truncated 2956 lines...] [INFO] Using default encoding to copy filtered resources. [INFO] [INFO] --- maven-compiler-plugin:2.5.1:testCompile (default-testCompile) @ hadoop-sls --- [INFO] Compiling 6 source files to https://builds.apache.org/job/Hadoop-Common-trunk/ws/hadoop-tools/hadoop-sls/target/test-classes [INFO] [INFO] --- maven-surefire-plugin:2.17:test (default-test) @ hadoop-sls --- [INFO] Tests are skipped. [INFO] [INFO] --- maven-jar-plugin:2.3.1:jar (default-jar) @ hadoop-sls --- [INFO] Building jar: https://builds.apache.org/job/Hadoop-Common-trunk/ws/hadoop-tools/hadoop-sls/target/hadoop-sls-3.0.0-SNAPSHOT.jar [INFO] [INFO] maven-source-plugin:2.1.2:jar (default) @ hadoop-sls [INFO] [INFO] --- maven-antrun-plugin:1.7:run (create-testdirs) @ hadoop-sls --- [INFO] Executing tasks main: [INFO] Executed tasks [INFO] [INFO] maven-source-plugin:2.1.2:jar (default) @ hadoop-sls [INFO] [INFO] --- maven-source-plugin:2.1.2:jar (default) @ hadoop-sls --- [INFO] Building jar: https://builds.apache.org/job/Hadoop-Common-trunk/ws/hadoop-tools/hadoop-sls/target/hadoop-sls-3.0.0-SNAPSHOT-sources.jar [INFO] [INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ hadoop-sls --- [INFO] [INFO] --- maven-enforcer-plugin:1.3.1:enforce (depcheck) @ hadoop-sls --- [INFO] [INFO] --- maven-install-plugin:2.3.1:install (default-install) @ hadoop-sls --- [INFO] Installing https://builds.apache.org/job/Hadoop-Common-trunk/ws/hadoop-tools/hadoop-sls/target/hadoop-sls-3.0.0-SNAPSHOT.jar to /home/jenkins/.m2/repository/org/apache/hadoop/hadoop-sls/3.0.0-SNAPSHOT/hadoop-sls-3.0.0-SNAPSHOT.jar [INFO] Installing https://builds.apache.org/job/Hadoop-Common-trunk/ws/hadoop-tools/hadoop-sls/pom.xml to /home/jenkins/.m2/repository/org/apache/hadoop/hadoop-sls/3.0.0-SNAPSHOT/hadoop-sls-3.0.0-SNAPSHOT.pom [INFO] Installing https://builds.apache.org/job/Hadoop-Common-trunk/ws/hadoop-tools/hadoop-sls/target/hadoop-sls-3.0.0-SNAPSHOT-sources.jar to /home/jenkins/.m2/repository/org/apache/hadoop/hadoop-sls/3.0.0-SNAPSHOT/hadoop-sls-3.0.0-SNAPSHOT-sources.jar [INFO] [INFO] [INFO] Building Apache Hadoop Tools Dist 3.0.0-SNAPSHOT [INFO] [INFO] [INFO] --- maven-clean-plugin:2.4.1:clean (default-clean) @ hadoop-tools-dist --- [INFO] Deleting https://builds.apache.org/job/Hadoop-Common-trunk/ws/hadoop-tools/hadoop-tools-dist/target [INFO] [INFO] --- maven-antrun-plugin:1.7:run (create-testdirs) @ hadoop-tools-dist --- [INFO] Executing tasks main: [mkdir] Created dir: https://builds.apache.org/job/Hadoop-Common-trunk/ws/hadoop-tools/hadoop-tools-dist/target/test-dir [mkdir] Created dir: https://builds.apache.org/job/Hadoop-Common-trunk/ws/hadoop-tools/hadoop-tools-dist/target/test/data [INFO] Executed tasks [INFO] [INFO] --- maven-resources-plugin:2.2:resources (default-resources) @ hadoop-tools-dist --- [INFO] Using default encoding to copy filtered resources. [INFO] [INFO] --- maven-compiler-plugin:2.5.1:compile (default-compile) @ hadoop-tools-dist --- [INFO] No sources to compile [INFO] [INFO] --- maven-resources-plugin:2.2:testResources (default-testResources) @ hadoop-tools-dist --- [INFO] Using default encoding to copy filtered resources. [INFO] [INFO] --- maven-compiler-plugin:2.5.1:testCompile (default-testCompile) @ hadoop-tools-dist --- [INFO] No sources to compile [INFO] [INFO] --- maven-surefire-plugin:2.17:test (default-test) @ hadoop-tools-dist --- [INFO] Tests are skipped. [INFO] [INFO] --- maven-jar-plugin:2.3.1:jar (prepare-jar) @
[jira] [Created] (HADOOP-11390) Metrics 2 ganglia provider to include hostname in unresolved address problems
Steve Loughran created HADOOP-11390: --- Summary: Metrics 2 ganglia provider to include hostname in unresolved address problems Key: HADOOP-11390 URL: https://issues.apache.org/jira/browse/HADOOP-11390 Project: Hadoop Common Issue Type: Improvement Components: metrics Affects Versions: 2.6.0 Reporter: Steve Loughran Priority: Minor When metrics2/ganglia gets an unresolved hostname it doesn't include the hostname in question, making it harder to track down -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Hadoop without HDFS
one more thing, the if excludes object stores which don't offer consistency and atomic create-no-overwrite and rename. You can't run all hadoop apps directly on top of Amazon S3, without extra work (see netflix S3mper). Object stores do not always behave as filesystems, even if they implement the relevant Hadoop APIs (some do though, like google's and microsoft's) HADOOP-9361 and the filesystem documentation attempt to formally specify what an FS should do; http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/index.html Where formally means try to rigorously define what HDFS does and how other filesystems (especially posix ones) differ HADOOP-9565 looking at some explicit ObjectStore subclass of FileSystem to provide more details on object stores On 10 December 2014 at 20:20, Ari King ari.brandeis.k...@gmail.com wrote: Hi, I'm doing a research paper on Hadoop -- specifically relating to its dependency on HDFS. I need to determine if and how HDFS can be replaced. As I understand it, there are a number of organizations that have produced HDFS alternatives that support the Hadoop ecosystem, i.e. MapReduce, Hive, HBase, etc. With the if part being answered, I'd appreciate insight/guidance on the how part. Essentially, where can I find information on what MapReduce and the other Hadoop subprojects require of the underlying file system and how these subprojects expect to interact with the file system. Thanks! Best, Ari -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Build failed in Jenkins: Hadoop-Common-trunk #1339
See https://builds.apache.org/job/Hadoop-Common-trunk/1339/ -- [...truncated 2944 lines...] [INFO] [INFO] --- maven-compiler-plugin:2.5.1:testCompile (default-testCompile) @ hadoop-sls --- [INFO] Compiling 6 source files to https://builds.apache.org/job/Hadoop-Common-trunk/ws/hadoop-tools/hadoop-sls/target/test-classes [INFO] [INFO] --- maven-surefire-plugin:2.17:test (default-test) @ hadoop-sls --- [INFO] Tests are skipped. [INFO] [INFO] --- maven-jar-plugin:2.3.1:jar (default-jar) @ hadoop-sls --- [INFO] Building jar: https://builds.apache.org/job/Hadoop-Common-trunk/ws/hadoop-tools/hadoop-sls/target/hadoop-sls-3.0.0-SNAPSHOT.jar [INFO] [INFO] maven-source-plugin:2.1.2:jar (default) @ hadoop-sls [INFO] [INFO] --- maven-antrun-plugin:1.7:run (create-testdirs) @ hadoop-sls --- [INFO] Executing tasks main: [INFO] Executed tasks [INFO] [INFO] maven-source-plugin:2.1.2:jar (default) @ hadoop-sls [INFO] [INFO] --- maven-source-plugin:2.1.2:jar (default) @ hadoop-sls --- [INFO] Building jar: https://builds.apache.org/job/Hadoop-Common-trunk/ws/hadoop-tools/hadoop-sls/target/hadoop-sls-3.0.0-SNAPSHOT-sources.jar [INFO] [INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ hadoop-sls --- [INFO] [INFO] --- maven-enforcer-plugin:1.3.1:enforce (depcheck) @ hadoop-sls --- [INFO] [INFO] --- maven-install-plugin:2.3.1:install (default-install) @ hadoop-sls --- [INFO] Installing https://builds.apache.org/job/Hadoop-Common-trunk/ws/hadoop-tools/hadoop-sls/target/hadoop-sls-3.0.0-SNAPSHOT.jar to /home/jenkins/.m2/repository/org/apache/hadoop/hadoop-sls/3.0.0-SNAPSHOT/hadoop-sls-3.0.0-SNAPSHOT.jar [INFO] Installing https://builds.apache.org/job/Hadoop-Common-trunk/ws/hadoop-tools/hadoop-sls/pom.xml to /home/jenkins/.m2/repository/org/apache/hadoop/hadoop-sls/3.0.0-SNAPSHOT/hadoop-sls-3.0.0-SNAPSHOT.pom [INFO] Installing https://builds.apache.org/job/Hadoop-Common-trunk/ws/hadoop-tools/hadoop-sls/target/hadoop-sls-3.0.0-SNAPSHOT-sources.jar to /home/jenkins/.m2/repository/org/apache/hadoop/hadoop-sls/3.0.0-SNAPSHOT/hadoop-sls-3.0.0-SNAPSHOT-sources.jar [INFO] [INFO] [INFO] Building Apache Hadoop Tools Dist 3.0.0-SNAPSHOT [INFO] [INFO] [INFO] --- maven-clean-plugin:2.4.1:clean (default-clean) @ hadoop-tools-dist --- [INFO] Deleting https://builds.apache.org/job/Hadoop-Common-trunk/ws/hadoop-tools/hadoop-tools-dist/target [INFO] [INFO] --- maven-antrun-plugin:1.7:run (create-testdirs) @ hadoop-tools-dist --- [INFO] Executing tasks main: [mkdir] Created dir: https://builds.apache.org/job/Hadoop-Common-trunk/ws/hadoop-tools/hadoop-tools-dist/target/test-dir [mkdir] Created dir: https://builds.apache.org/job/Hadoop-Common-trunk/ws/hadoop-tools/hadoop-tools-dist/target/test/data [INFO] Executed tasks [INFO] [INFO] --- maven-resources-plugin:2.2:resources (default-resources) @ hadoop-tools-dist --- [INFO] Using default encoding to copy filtered resources. [INFO] [INFO] --- maven-compiler-plugin:2.5.1:compile (default-compile) @ hadoop-tools-dist --- [INFO] No sources to compile [INFO] [INFO] --- maven-resources-plugin:2.2:testResources (default-testResources) @ hadoop-tools-dist --- [INFO] Using default encoding to copy filtered resources. [INFO] [INFO] --- maven-compiler-plugin:2.5.1:testCompile (default-testCompile) @ hadoop-tools-dist --- [INFO] No sources to compile [INFO] [INFO] --- maven-surefire-plugin:2.17:test (default-test) @ hadoop-tools-dist --- [INFO] Tests are skipped. [INFO] [INFO] --- maven-jar-plugin:2.3.1:jar (prepare-jar) @ hadoop-tools-dist --- [WARNING] JAR will be empty - no content was marked for inclusion! [INFO] Building jar: https://builds.apache.org/job/Hadoop-Common-trunk/ws/hadoop-tools/hadoop-tools-dist/target/hadoop-tools-dist-3.0.0-SNAPSHOT.jar [INFO] [INFO] --- maven-jar-plugin:2.3.1:test-jar (prepare-test-jar) @ hadoop-tools-dist --- [WARNING] JAR will be empty - no content was marked for inclusion! [INFO] Building jar: https://builds.apache.org/job/Hadoop-Common-trunk/ws/hadoop-tools/hadoop-tools-dist/target/hadoop-tools-dist-3.0.0-SNAPSHOT-tests.jar [INFO] [INFO] maven-source-plugin:2.1.2:jar (default) @ hadoop-tools-dist [INFO] [INFO] --- maven-antrun-plugin:1.7:run (create-testdirs) @ hadoop-tools-dist --- [INFO] Executing tasks main: [INFO] Executed tasks [INFO] [INFO] maven-source-plugin:2.1.2:jar (default) @ hadoop-tools-dist [INFO] [INFO] --- maven-source-plugin:2.1.2:jar (default) @ hadoop-tools-dist --- [INFO] No sources in project. Archive not created. [INFO] [INFO] maven-source-plugin:2.1.2:test-jar (default) @ hadoop-tools-dist [INFO] [INFO] --- maven-antrun-plugin:1.7:run (create-testdirs) @ hadoop-tools-dist --- [INFO] Executing tasks
Re: Solaris Port
FYI, there are a couple more files that reference sys_errlist directly (not just terror within exception.c) , but also hdfs_http_client.c and NativeiO.c On 12/11/2014 07:38 AM, malcolm wrote: Hi Colin, Exactly, as you noticed, the problem is the thread-local buffer needed to return from terror. Currently, terror just returns a static string from an array, this is fast, simple and error-proof. In order to use strerror_r inside terror, would require allocating a buffer inside terror and depend on the caller to free the buffer after using it, or to pass a buffer to terrror (which is basically the same as strerror_r, rendering terror redundant). Both cases require modification outside terror itself, as far as I can tell, no simple fix. Unless you have an alternative which I haven't thought of ? As far as I can tell, we have two choices: 1. Remove terror and replace calls with strerror_r, passing a buffer from the callee. Advantage: a more modern portable interface. Disadvantage: All calls to terror need to be modified, though all seem to be in a few files as far as I can tell. 2. Adding a sys_errlist array (ifdeffed for Solaris) Advantage: no change to any calls to terror Disadvantage: 2 additional files added to source tree (.c and .h) and some minor ifdefs only used for Solaris. I think it is more a question of style than anything else, so I leave you to make the call. Thanks for your patience, Malcolm On 12/10/2014 09:54 PM, Colin McCabe wrote: On Wed, Dec 10, 2014 at 2:31 AM, malcolm malcolm.kaval...@oracle.com wrote: Hi Colin, Thanks for the hints around JIRAs. You are correct errno still exists, however sys_errlist does not. Hadoop uses a function terror (defined in exception.c) which indexes sys_errlist by errno to return the error message from the array. This function is called 26 times in various places (in 2.2) Originally, I thought to replace all calls to terror with strerror, but there can be issues with multi-threading (it returns a buffer which can be overwritten), so it seemed simpler just to recreate the sys_errlist message array. There is also a multi-threaded version strerror_r where you pass the buffer as a parameter, but this would necessitate changing every call to terror with mutiple lines of code. Why don't you just use strerror_r inside terror()? I wrote that code originally. The reason I didn't want to use strerror_r there is because GNU libc provides a non-POSIX definition of strerror_r, and forcing it to use the POSIX one is a pain. But you can do it. You also will require a thread-local buffer to hold the return from strerror_r, since it is not guaranteed to be static (although in practice it is 99% of the time-- another annoyance with the API).
Jenkins build is back to normal : Hadoop-Common-trunk #1340
See https://builds.apache.org/job/Hadoop-Common-trunk/1340/
Re: Solaris Port
sys_errlist was removed for a reason. Creating a fake sys_errlist on Solaris will mean the libhadoop.so will need to be tied a specific build (kernel/include pairing) and therefore limits upward mobility/compatibility. That doesn’t seem like a very good idea. IMO, switching to strerror_r is much preferred, since other than the brain-dead GNU libc version, is highly portable and should work regardless of the kernel or OS in place. On Dec 11, 2014, at 5:20 AM, malcolm malcolm.kaval...@oracle.com wrote: FYI, there are a couple more files that reference sys_errlist directly (not just terror within exception.c) , but also hdfs_http_client.c and NativeiO.c On 12/11/2014 07:38 AM, malcolm wrote: Hi Colin, Exactly, as you noticed, the problem is the thread-local buffer needed to return from terror. Currently, terror just returns a static string from an array, this is fast, simple and error-proof. In order to use strerror_r inside terror, would require allocating a buffer inside terror and depend on the caller to free the buffer after using it, or to pass a buffer to terrror (which is basically the same as strerror_r, rendering terror redundant). Both cases require modification outside terror itself, as far as I can tell, no simple fix. Unless you have an alternative which I haven't thought of ? As far as I can tell, we have two choices: 1. Remove terror and replace calls with strerror_r, passing a buffer from the callee. Advantage: a more modern portable interface. Disadvantage: All calls to terror need to be modified, though all seem to be in a few files as far as I can tell. 2. Adding a sys_errlist array (ifdeffed for Solaris) Advantage: no change to any calls to terror Disadvantage: 2 additional files added to source tree (.c and .h) and some minor ifdefs only used for Solaris. I think it is more a question of style than anything else, so I leave you to make the call. Thanks for your patience, Malcolm On 12/10/2014 09:54 PM, Colin McCabe wrote: On Wed, Dec 10, 2014 at 2:31 AM, malcolm malcolm.kaval...@oracle.com wrote: Hi Colin, Thanks for the hints around JIRAs. You are correct errno still exists, however sys_errlist does not. Hadoop uses a function terror (defined in exception.c) which indexes sys_errlist by errno to return the error message from the array. This function is called 26 times in various places (in 2.2) Originally, I thought to replace all calls to terror with strerror, but there can be issues with multi-threading (it returns a buffer which can be overwritten), so it seemed simpler just to recreate the sys_errlist message array. There is also a multi-threaded version strerror_r where you pass the buffer as a parameter, but this would necessitate changing every call to terror with mutiple lines of code. Why don't you just use strerror_r inside terror()? I wrote that code originally. The reason I didn't want to use strerror_r there is because GNU libc provides a non-POSIX definition of strerror_r, and forcing it to use the POSIX one is a pain. But you can do it. You also will require a thread-local buffer to hold the return from strerror_r, since it is not guaranteed to be static (although in practice it is 99% of the time-- another annoyance with the API).
[jira] [Created] (HADOOP-11391) Enabling HVE/node awareness does not rebalance replicas on data that existed prior to topology changes.
ellen johansen created HADOOP-11391: --- Summary: Enabling HVE/node awareness does not rebalance replicas on data that existed prior to topology changes. Key: HADOOP-11391 URL: https://issues.apache.org/jira/browse/HADOOP-11391 Project: Hadoop Common Issue Type: Bug Environment: VMWare w/ local storage Reporter: ellen johansen Enabling HVE/node awareness does not rebalance replicas on data that existed prior to topology changes. [root@vmw-d10-001 jenkins]# more /opt/cloudera/topology.data 10.20.xxx.161 /rack1/nodegroup1 10.20.xxx.162 /rack1/nodegroup1 10.20.xxx.163 /rack3/nodegroup1 10.20.xxx.164 /rack3/nodegroup1 172.17.xxx.71 /rack2/nodegroup1 172.17.xxx.72 /rack2/nodegroup1 before HVE: /user/impalauser/tpcds/store_sales dir /user/impalauser/tpcds/store_sales/store_sales.dat 1180463121 bytes, 9 block(s): OK 0. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742xxx_1382 len=134217728 repl=3 [10.20.xxx.164:20002, 10.20.xxx.161:20002, 10.20.xxx.163:20002] 1. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742213_1389 len=134217728 repl=3 [10.20.xxx.164:20002, 172.17.xxx.72:20002, 10.20.xxx.161:20002] 2. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742214_1390 len=134217728 repl=3 [10.20.xxx.164:20002, 172.17.xxx.72:20002, 10.20.xxx.163:20002] 3. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742215_1391 len=134217728 repl=3 [10.20.xxx.164:20002, 172.17.xxx.72:20002, 10.20.xxx.163:20002] 4. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742216_1392 len=134217728 repl=3 [10.20.xxx.164:20002, 10.20.xxx.161:20002, 172.17.xxx.72:20002] 5. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742217_1393 len=134217728 repl=3 [10.20.xxx.164:20002, 172.17.xxx.72:20002, 10.20.xxx.163:20002] 6. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742220_1396 len=134217728 repl=3 [10.20.xxx.164:20002, 10.20.xxx.162:20002, 10.20.xxx.163:20002] 7. BP-1184748135-172.17.xxx.71-1418235396548:blk_107374_1398 len=134217728 repl=3 [10.20.xxx.164:20002, 10.20.xxx.163:20002, 10.20.xxx.161:20002] 8. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742224_1400 len=106721297 repl=3 [10.20.xxx.164:20002, 10.20.xxx.162:20002, 172.17.xxx.72:20002] - Before enabling HVE: Status: HEALTHY Total size:1648156454 B (Total open files size: 498 B) Total dirs:138 Total files: 384 Total symlinks:0 (Files currently being written: 6) Total blocks (validated): 390 (avg. block size 4226042 B) (Total open file blocks (not validated): 6) Minimally replicated blocks: 390 (100.0 %) Over-replicated blocks:0 (0.0 %) Under-replicated blocks: 1 (0.25641027 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor:3 Average block replication: 2.8564103 Corrupt blocks:0 Missing replicas: 5 (0.44682753 %) Number of data-nodes: 5 Number of racks: 1 FSCK ended at Wed Dec 10 14:04:35 EST 2014 in 50 milliseconds The filesystem under path '/' is HEALTHY -- after HVE (and NN restart): /user/impalauser/tpcds/store_sales dir /user/impalauser/tpcds/store_sales/store_sales.dat 1180463121 bytes, 9 block(s): OK 0. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742xxx_1382 len=134217728 repl=3 [10.20.xxx.164:20002, 10.20.xxx.163:20002, 10.20.xxx.161:20002] 1. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742213_1389 len=134217728 repl=3 [172.17.xxx.72:20002, 10.20.xxx.164:20002, 10.20.xxx.161:20002] 2. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742214_1390 len=134217728 repl=3 [172.17.xxx.72:20002, 10.20.xxx.164:20002, 10.20.xxx.163:20002] 3. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742215_1391 len=134217728 repl=3 [172.17.xxx.72:20002, 10.20.xxx.164:20002, 10.20.xxx.163:20002] 4. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742216_1392 len=134217728 repl=3 [172.17.xxx.72:20002, 10.20.xxx.164:20002, 10.20.xxx.161:20002] 5. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742217_1393 len=134217728 repl=3 [172.17.xxx.72:20002, 10.20.xxx.164:20002, 10.20.xxx.163:20002] 6. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742220_1396 len=134217728 repl=3 [10.20.xxx.164:20002, 10.20.xxx.163:20002, 10.20.xxx.162:20002] 7. BP-1184748135-172.17.xxx.71-1418235396548:blk_107374_1398 len=134217728 repl=3 [10.20.xxx.164:20002, 10.20.xxx.163:20002, 10.20.xxx.161:20002] 8. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742224_1400 len=106721297 repl=3 [172.17.xxx.72:20002, 10.20.xxx.164:20002, 10.20.xxx.162:20002] Status: HEALTHY Total size:1659427036 B (Total open files size: 498 B) Total dirs:176 Total files: 529 Total symlinks:0 (Files currently being written: 6) Total blocks (validated): 532 (avg. block size 3119223 B) (Total open file blocks (not validated): 6) Minimally
Re: Solaris Port
Fine with me, I volunteer to do this, if accepted. On 12/11/2014 05:48 PM, Allen Wittenauer wrote: sys_errlist was removed for a reason. Creating a fake sys_errlist on Solaris will mean the libhadoop.so will need to be tied a specific build (kernel/include pairing) and therefore limits upward mobility/compatibility. That doesn’t seem like a very good idea. IMO, switching to strerror_r is much preferred, since other than the brain-dead GNU libc version, is highly portable and should work regardless of the kernel or OS in place. On Dec 11, 2014, at 5:20 AM, malcolm malcolm.kaval...@oracle.com wrote: FYI, there are a couple more files that reference sys_errlist directly (not just terror within exception.c) , but also hdfs_http_client.c and NativeiO.c On 12/11/2014 07:38 AM, malcolm wrote: Hi Colin, Exactly, as you noticed, the problem is the thread-local buffer needed to return from terror. Currently, terror just returns a static string from an array, this is fast, simple and error-proof. In order to use strerror_r inside terror, would require allocating a buffer inside terror and depend on the caller to free the buffer after using it, or to pass a buffer to terrror (which is basically the same as strerror_r, rendering terror redundant). Both cases require modification outside terror itself, as far as I can tell, no simple fix. Unless you have an alternative which I haven't thought of ? As far as I can tell, we have two choices: 1. Remove terror and replace calls with strerror_r, passing a buffer from the callee. Advantage: a more modern portable interface. Disadvantage: All calls to terror need to be modified, though all seem to be in a few files as far as I can tell. 2. Adding a sys_errlist array (ifdeffed for Solaris) Advantage: no change to any calls to terror Disadvantage: 2 additional files added to source tree (.c and .h) and some minor ifdefs only used for Solaris. I think it is more a question of style than anything else, so I leave you to make the call. Thanks for your patience, Malcolm On 12/10/2014 09:54 PM, Colin McCabe wrote: On Wed, Dec 10, 2014 at 2:31 AM, malcolm malcolm.kaval...@oracle.com wrote: Hi Colin, Thanks for the hints around JIRAs. You are correct errno still exists, however sys_errlist does not. Hadoop uses a function terror (defined in exception.c) which indexes sys_errlist by errno to return the error message from the array. This function is called 26 times in various places (in 2.2) Originally, I thought to replace all calls to terror with strerror, but there can be issues with multi-threading (it returns a buffer which can be overwritten), so it seemed simpler just to recreate the sys_errlist message array. There is also a multi-threaded version strerror_r where you pass the buffer as a parameter, but this would necessitate changing every call to terror with mutiple lines of code. Why don't you just use strerror_r inside terror()? I wrote that code originally. The reason I didn't want to use strerror_r there is because GNU libc provides a non-POSIX definition of strerror_r, and forcing it to use the POSIX one is a pain. But you can do it. You also will require a thread-local buffer to hold the return from strerror_r, since it is not guaranteed to be static (although in practice it is 99% of the time-- another annoyance with the API).
RE: Solaris Port
Hi Malcom, Recently, I had to work on a function to get system error message on various systems. Here is the piece of code I came up with. Hope it helps. static void get_system_error_message(char * buf, int buf_len, int code) { #if defined(_WIN32) LPVOID lpMsgBuf; DWORD status = FormatMessage(FORMAT_MESSAGE_ALLOCATE_BUFFER | FORMAT_MESSAGE_FROM_SYSTEM | FORMAT_MESSAGE_IGNORE_INSERTS, NULL, code, MAKELANGID(LANG_NEUTRAL, SUBLANG_DEFAULT), /* Default language */ (LPTSTR) lpMsgBuf, 0, NULL); if (status 0) { strncpy(buf, (char *)lpMsgBuf, buf_len-1); buf[buf_len-1] = '\0'; /* Free the buffer returned by system */ LocalFree(lpMsgBuf); } else { _snprintf(buf, buf_len-1 , %s %d, Can't get system error message for code, code); buf[buf_len-1] = '\0'; } #else #if defined(_HPUX_SOURCE) { char * msg; errno = 0; msg = strerror(code); if (errno == 0) { strncpy(buf, msg, buf_len-1); buf[buf_len-1] = '\0'; } else { snprintf(buf, buf_len, %s %d, Can't get system error message for code, code); } } #else if (strerror_r(code, buf, buf_len) != 0) { snprintf(buf, buf_len, %s %d, Can't get system error message for code, code); } #endif #endif } Note that HPUX does not have strerror_r() since strerror() itself is thread-safe. Also Windows does not have snprintf(). The equivalent function _snprintf() has a subtle difference in its interface. -- Asokan From: malcolm [malcolm.kaval...@oracle.com] Sent: Thursday, December 11, 2014 11:02 AM To: common-dev@hadoop.apache.org Subject: Re: Solaris Port Fine with me, I volunteer to do this, if accepted. On 12/11/2014 05:48 PM, Allen Wittenauer wrote: sys_errlist was removed for a reason. Creating a fake sys_errlist on Solaris will mean the libhadoop.so will need to be tied a specific build (kernel/include pairing) and therefore limits upward mobility/compatibility. That doesn’t seem like a very good idea. IMO, switching to strerror_r is much preferred, since other than the brain-dead GNU libc version, is highly portable and should work regardless of the kernel or OS in place. On Dec 11, 2014, at 5:20 AM, malcolm malcolm.kaval...@oracle.com wrote: FYI, there are a couple more files that reference sys_errlist directly (not just terror within exception.c) , but also hdfs_http_client.c and NativeiO.c On 12/11/2014 07:38 AM, malcolm wrote: Hi Colin, Exactly, as you noticed, the problem is the thread-local buffer needed to return from terror. Currently, terror just returns a static string from an array, this is fast, simple and error-proof. In order to use strerror_r inside terror, would require allocating a buffer inside terror and depend on the caller to free the buffer after using it, or to pass a buffer to terrror (which is basically the same as strerror_r, rendering terror redundant). Both cases require modification outside terror itself, as far as I can tell, no simple fix. Unless you have an alternative which I haven't thought of ? As far as I can tell, we have two choices: 1. Remove terror and replace calls with strerror_r, passing a buffer from the callee. Advantage: a more modern portable interface. Disadvantage: All calls to terror need to be modified, though all seem to be in a few files as far as I can tell. 2. Adding a sys_errlist array (ifdeffed for Solaris) Advantage: no change to any calls to terror Disadvantage: 2 additional files added to source tree (.c and .h) and some minor ifdefs only used for Solaris. I think it is more a question of style than anything else, so I leave you to make the call. Thanks for your patience, Malcolm On 12/10/2014 09:54 PM, Colin McCabe wrote: On Wed, Dec 10, 2014 at 2:31 AM, malcolm malcolm.kaval...@oracle.com wrote: Hi Colin, Thanks for the hints around JIRAs. You are correct errno still exists, however sys_errlist does not. Hadoop uses a function terror (defined in exception.c) which indexes sys_errlist by errno to return the error message from the array. This function is called 26 times in various places (in 2.2) Originally, I thought to replace all calls to terror with strerror, but there can be issues with multi-threading (it returns a buffer which can be overwritten), so it seemed simpler just to recreate the sys_errlist message array. There is also a multi-threaded version strerror_r where you pass the buffer as a parameter, but this would necessitate changing every call to terror with mutiple lines
[jira] [Created] (HADOOP-11392) FileUtil.java leaks file descriptor when copybytes success.
Brahma Reddy Battula created HADOOP-11392: - Summary: FileUtil.java leaks file descriptor when copybytes success. Key: HADOOP-11392 URL: https://issues.apache.org/jira/browse/HADOOP-11392 Project: Hadoop Common Issue Type: Bug Reporter: Brahma Reddy Battula Please check following code for same.. {code} try { in = srcFS.open(src); out = dstFS.create(dst, overwrite); IOUtils.copyBytes(in, out, conf, true); } catch (IOException e) { IOUtils.closeStream(out); IOUtils.closeStream(in); throw e; } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-11393) Revert HADOOP_PREFIX, go back to HADOOP_HOME
Allen Wittenauer created HADOOP-11393: - Summary: Revert HADOOP_PREFIX, go back to HADOOP_HOME Key: HADOOP-11393 URL: https://issues.apache.org/jira/browse/HADOOP-11393 Project: Hadoop Common Issue Type: Improvement Reporter: Allen Wittenauer Today, Windows and parts of the Hadoop source code still use HADOOP_HOME. The switch to HADOOP_PREFIX back in 0.21 or so didn't really accomplish what it was intended to do and only helped confuse the situation. _HOME is a much more standard suffix and is, in fact, used for everything in Hadoop except for the top level project home. I think it would be beneficial to use HADOOP_HOME in the shell code as the Official(tm) variable, still honoring HADOOP_PREFIX if it is set. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-11394) hadoop-aws documentation missing.
Chris Nauroth created HADOOP-11394: -- Summary: hadoop-aws documentation missing. Key: HADOOP-11394 URL: https://issues.apache.org/jira/browse/HADOOP-11394 Project: Hadoop Common Issue Type: Bug Components: documentation Affects Versions: 2.7.0 Reporter: Chris Nauroth Assignee: Chris Nauroth In HADOOP-10714, the documentation source files for hadoop-aws were moved from src/site to src/main/site. The build is no longer actually generating the HTML site from these source files, because src/site is the expected path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-11395) Add site documentation for Azure Storage FileSystem integration.
Chris Nauroth created HADOOP-11395: -- Summary: Add site documentation for Azure Storage FileSystem integration. Key: HADOOP-11395 URL: https://issues.apache.org/jira/browse/HADOOP-11395 Project: Hadoop Common Issue Type: Improvement Components: documentation Reporter: Chris Nauroth Assignee: Chris Nauroth The scope of this issue is to add site documentation covering our Azure Storage FileSystem integration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-11396) Provide navigation in the site documentation linking to the Hadoop Compatible File Systems.
Chris Nauroth created HADOOP-11396: -- Summary: Provide navigation in the site documentation linking to the Hadoop Compatible File Systems. Key: HADOOP-11396 URL: https://issues.apache.org/jira/browse/HADOOP-11396 Project: Hadoop Common Issue Type: Improvement Components: documentation Reporter: Chris Nauroth We build site documentation for hadoop-aws and hadoop-openstack, and we'll soon have documentation for hadoop-azure. This documentation is not linked from the main site though, so unless a user knows the direct URL, they won't be able to find it. This issue proposes adding navigation to the site to make it easier to find these documents. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: submitting a hadoop patch doesn't trigger jenkins test run
Hi, I wonder if anyone can help on resolving HADOOP-11320 https://issues.apache.org/jira/browse/HADOOP-11320 to increase timeout for jenkins test of crossing-subproject patches? Thanks a lot, --Yongjun On Tue, Dec 2, 2014 at 10:10 AM, Yongjun Zhang yzh...@cloudera.com wrote: Hi, Thank you all for the input. https://issues.apache.org/jira/browse/HADOOP-11320 was created for this issue. Welcome to give your further comments there. Best, --Yongjun On Tue, Nov 25, 2014 at 10:26 PM, Colin McCabe cmcc...@alumni.cmu.edu wrote: +1 for increasing the test timeout for tests spanning multiple sub-projects. I can see the value in what Steve L. suggested... if you make a major change that touches a particular subproject, you should try to get the approval of a committer who knows that subproject. But I don't think that forcing artificial patch splits is the way to do this... There are also some patches that are completely mechanical and don't really require the involvement of YARN / HDFS committer, even if they change that project. For example, fixing a misspelling in the name of a hadoop-common API. Colin On Tue, Nov 25, 2014 at 8:45 AM, Yongjun Zhang yzh...@cloudera.com wrote: Thanks all for the feedback. To summarize (and I have a suggestion at the end of this email), there are two scenarios: 1. A change that span multiple *bigger* projects. r.g. hadoop, hbase. 2. A change that span multiple *sub* projects* within hadoop, e.g., common, hdfs, yarn For 1, it's required for the change to be backward compatible, thus splitting change for multiple *bigger* projects is a must. For 2, there are two sub types, - 2.1 those changes that can be made within hadoop sub-projects, and there is no external impact - 2.2 those changes that have external impact, that is, the changes involve adding new APIs and marking old API deprecated, and corresponding changes in other *bigger* projects will have to be made independently. *But the changes within hadoop subjects can still be done altogether.* I think (Please correct me if I'm wrong): - What Colin referred to is 2.1 and changes within hadoop sub-subjects for 2.2; - Steve's not for changes across hadoop-common and hdfs, or hadoop-common and yarn means 2.1, Steve's changes that only span hdfs-and-yarn would be fairly doubtful too. implies his doubt of existence of 2.1. For changes of 2.1 (if any) and *hadoop* changes of 2.2, we do have an option of making the change across all hadoop sub-projects altogether, to save the multiple steps Colin referred to. If this option is feasible, should we consider increasing the jenkins timeout for this kind of changes (I mean making the timeout adjustable, if it's for single sub-project, use the old timeout; otherwise, increase accordingly) so that we have at least this option when needed? Thanks. --Yongjun On Tue, Nov 25, 2014 at 2:28 AM, Steve Loughran ste...@hortonworks.com wrote: On 25 November 2014 at 00:58, Bernd Eckenfels e...@zusammenkunft.net wrote: Hello, Am Mon, 24 Nov 2014 16:16:00 -0800 schrieb Colin McCabe cmcc...@alumni.cmu.edu: Conceptually, I think it's important to support patches that modify multiple sub-projects. Otherwise refactoring things in common becomes a multi-step process. This might be rather philosophical (and I dont want to argue the need to have the patch infrastructure work for the multi-project case), howevere if a multi-project change cannot be applied in multiple steps it is probably also not safe at runtime (unless the multiple projects belong to a single instance/artifact). And then beeing forced to commit/compile/test in multiple steps actually increases the dependencies topology. +1 for changes that span, say hadoop and hbase. but not for changes across hadoop-common and hdfs, or hadoop-common and yarn. changes that only span hdfs-and-yarn would be fairly doubtful too. there is a dependency graph in hadoop's own jars —and cross module (not cross project) changes do need to happen. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Solaris Port
Hi Asok, I googled and found that windows has strerror, and strerror_s (which is the strerror_r equivalent). Is there a reason why you didn't use this call ? On 12/11/2014 06:27 PM, Asokan, M wrote: Hi Malcom, Recently, I had to work on a function to get system error message on various systems. Here is the piece of code I came up with. Hope it helps. static void get_system_error_message(char * buf, int buf_len, int code) { #if defined(_WIN32) LPVOID lpMsgBuf; DWORD status = FormatMessage(FORMAT_MESSAGE_ALLOCATE_BUFFER | FORMAT_MESSAGE_FROM_SYSTEM | FORMAT_MESSAGE_IGNORE_INSERTS, NULL, code, MAKELANGID(LANG_NEUTRAL, SUBLANG_DEFAULT), /* Default language */ (LPTSTR) lpMsgBuf, 0, NULL); if (status 0) { strncpy(buf, (char *)lpMsgBuf, buf_len-1); buf[buf_len-1] = '\0'; /* Free the buffer returned by system */ LocalFree(lpMsgBuf); } else { _snprintf(buf, buf_len-1 , %s %d, Can't get system error message for code, code); buf[buf_len-1] = '\0'; } #else #if defined(_HPUX_SOURCE) { char * msg; errno = 0; msg = strerror(code); if (errno == 0) { strncpy(buf, msg, buf_len-1); buf[buf_len-1] = '\0'; } else { snprintf(buf, buf_len, %s %d, Can't get system error message for code, code); } } #else if (strerror_r(code, buf, buf_len) != 0) { snprintf(buf, buf_len, %s %d, Can't get system error message for code, code); } #endif #endif } Note that HPUX does not have strerror_r() since strerror() itself is thread-safe. Also Windows does not have snprintf(). The equivalent function _snprintf() has a subtle difference in its interface. -- Asokan From: malcolm [malcolm.kaval...@oracle.com] Sent: Thursday, December 11, 2014 11:02 AM To: common-dev@hadoop.apache.org Subject: Re: Solaris Port Fine with me, I volunteer to do this, if accepted. On 12/11/2014 05:48 PM, Allen Wittenauer wrote: sys_errlist was removed for a reason. Creating a fake sys_errlist on Solaris will mean the libhadoop.so will need to be tied a specific build (kernel/include pairing) and therefore limits upward mobility/compatibility. That doesn’t seem like a very good idea. IMO, switching to strerror_r is much preferred, since other than the brain-dead GNU libc version, is highly portable and should work regardless of the kernel or OS in place. On Dec 11, 2014, at 5:20 AM, malcolm malcolm.kaval...@oracle.com wrote: FYI, there are a couple more files that reference sys_errlist directly (not just terror within exception.c) , but also hdfs_http_client.c and NativeiO.c On 12/11/2014 07:38 AM, malcolm wrote: Hi Colin, Exactly, as you noticed, the problem is the thread-local buffer needed to return from terror. Currently, terror just returns a static string from an array, this is fast, simple and error-proof. In order to use strerror_r inside terror, would require allocating a buffer inside terror and depend on the caller to free the buffer after using it, or to pass a buffer to terrror (which is basically the same as strerror_r, rendering terror redundant). Both cases require modification outside terror itself, as far as I can tell, no simple fix. Unless you have an alternative which I haven't thought of ? As far as I can tell, we have two choices: 1. Remove terror and replace calls with strerror_r, passing a buffer from the callee. Advantage: a more modern portable interface. Disadvantage: All calls to terror need to be modified, though all seem to be in a few files as far as I can tell. 2. Adding a sys_errlist array (ifdeffed for Solaris) Advantage: no change to any calls to terror Disadvantage: 2 additional files added to source tree (.c and .h) and some minor ifdefs only used for Solaris. I think it is more a question of style than anything else, so I leave you to make the call. Thanks for your patience, Malcolm On 12/10/2014 09:54 PM, Colin McCabe wrote: On Wed, Dec 10, 2014 at 2:31 AM, malcolm malcolm.kaval...@oracle.com wrote: Hi Colin, Thanks for the hints around JIRAs. You are correct errno still exists, however sys_errlist does not. Hadoop uses a function terror (defined in exception.c) which indexes sys_errlist by errno to return the error message from the array. This function is called 26 times in various places (in 2.2) Originally, I thought to replace all calls to terror with strerror, but there can be issues with multi-threading (it returns a buffer which can be overwritten), so it seemed simpler just to recreate the
[jira] [Created] (HADOOP-11397) Can't override HADOOP_IDENT_STRING
Allen Wittenauer created HADOOP-11397: - Summary: Can't override HADOOP_IDENT_STRING Key: HADOOP-11397 URL: https://issues.apache.org/jira/browse/HADOOP-11397 Project: Hadoop Common Issue Type: Bug Reporter: Allen Wittenauer Priority: Trivial Simple typo in hadoop_basic_init: HADOOP_IDENT_STRING=${HADOP_IDENT_STRING:-$USER} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
RE: Solaris Port
Hi Malcom, The Windows versions of strerror() and strerror_s() functions are probably meant for ANSI C library functions that set errno. For core Windows API calls (like UNIX system calls), one gets the error number by calling GetLastError() function. In the code snippet I sent earlier, the code argument is the value returned by GetLastError(). Neither strerror() nor strerror_s() will give the correct error message for this error code. You could probably look at libwinutils.c in Hadoop source. It uses FormatMessageW (which returns messages in Unicode.) My requirement was to return messages in current system locale. -- Asokan From: malcolm [malcolm.kaval...@oracle.com] Sent: Thursday, December 11, 2014 4:04 PM To: common-dev@hadoop.apache.org Subject: Re: Solaris Port Hi Asok, I googled and found that windows has strerror, and strerror_s (which is the strerror_r equivalent). Is there a reason why you didn't use this call ? On 12/11/2014 06:27 PM, Asokan, M wrote: Hi Malcom, Recently, I had to work on a function to get system error message on various systems. Here is the piece of code I came up with. Hope it helps. static void get_system_error_message(char * buf, int buf_len, int code) { #if defined(_WIN32) LPVOID lpMsgBuf; DWORD status = FormatMessage(FORMAT_MESSAGE_ALLOCATE_BUFFER | FORMAT_MESSAGE_FROM_SYSTEM | FORMAT_MESSAGE_IGNORE_INSERTS, NULL, code, MAKELANGID(LANG_NEUTRAL, SUBLANG_DEFAULT), /* Default language */ (LPTSTR) lpMsgBuf, 0, NULL); if (status 0) { strncpy(buf, (char *)lpMsgBuf, buf_len-1); buf[buf_len-1] = '\0'; /* Free the buffer returned by system */ LocalFree(lpMsgBuf); } else { _snprintf(buf, buf_len-1 , %s %d, Can't get system error message for code, code); buf[buf_len-1] = '\0'; } #else #if defined(_HPUX_SOURCE) { char * msg; errno = 0; msg = strerror(code); if (errno == 0) { strncpy(buf, msg, buf_len-1); buf[buf_len-1] = '\0'; } else { snprintf(buf, buf_len, %s %d, Can't get system error message for code, code); } } #else if (strerror_r(code, buf, buf_len) != 0) { snprintf(buf, buf_len, %s %d, Can't get system error message for code, code); } #endif #endif } Note that HPUX does not have strerror_r() since strerror() itself is thread-safe. Also Windows does not have snprintf(). The equivalent function _snprintf() has a subtle difference in its interface. -- Asokan From: malcolm [malcolm.kaval...@oracle.com] Sent: Thursday, December 11, 2014 11:02 AM To: common-dev@hadoop.apache.org Subject: Re: Solaris Port Fine with me, I volunteer to do this, if accepted. On 12/11/2014 05:48 PM, Allen Wittenauer wrote: sys_errlist was removed for a reason. Creating a fake sys_errlist on Solaris will mean the libhadoop.so will need to be tied a specific build (kernel/include pairing) and therefore limits upward mobility/compatibility. That doesn’t seem like a very good idea. IMO, switching to strerror_r is much preferred, since other than the brain-dead GNU libc version, is highly portable and should work regardless of the kernel or OS in place. On Dec 11, 2014, at 5:20 AM, malcolm malcolm.kaval...@oracle.com wrote: FYI, there are a couple more files that reference sys_errlist directly (not just terror within exception.c) , but also hdfs_http_client.c and NativeiO.c On 12/11/2014 07:38 AM, malcolm wrote: Hi Colin, Exactly, as you noticed, the problem is the thread-local buffer needed to return from terror. Currently, terror just returns a static string from an array, this is fast, simple and error-proof. In order to use strerror_r inside terror, would require allocating a buffer inside terror and depend on the caller to free the buffer after using it, or to pass a buffer to terrror (which is basically the same as strerror_r, rendering terror redundant). Both cases require modification outside terror itself, as far as I can tell, no simple fix. Unless you have an alternative which I haven't thought of ? As far as I can tell, we have two choices: 1. Remove terror and replace calls with strerror_r, passing a buffer from the callee. Advantage: a more modern portable interface. Disadvantage: All calls to terror need to be modified, though all seem to be in a few files as far as I can tell. 2. Adding a sys_errlist array (ifdeffed for Solaris)
[jira] [Created] (HADOOP-11398) RetryUpToMaximumTimeWithFixedSleep needs to behave more accurately
Li Lu created HADOOP-11398: -- Summary: RetryUpToMaximumTimeWithFixedSleep needs to behave more accurately Key: HADOOP-11398 URL: https://issues.apache.org/jira/browse/HADOOP-11398 Project: Hadoop Common Issue Type: Bug Reporter: Li Lu Assignee: Li Lu RetryUpToMaximumTimeWithFixedSleep now inherits RetryUpToMaximumCountWithFixedSleep and just acts as a wrapper to decide maxRetries. The current implementation uses (maxTime / sleepTime) as the number of maxRetries. This is fine if the actual for each retry is significantly less than the sleep time, but it becomes less accurate if each retry takes comparable amount of time as the sleep time. The problem gets worse when there are underlying retries. We may want to use timers inside RetryUpToMaximumTimeWithFixedSleep to perform accurate timing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: submitting a hadoop patch doesn't trigger jenkins test run
Many thanks to Ted Yu, Steve Loughran and Andrew Wang for replying in the jira and Steve/Andrew for making the related changes! --Yongjun On Thu, Dec 11, 2014 at 12:41 PM, Yongjun Zhang yzh...@cloudera.com wrote: Hi, I wonder if anyone can help on resolving HADOOP-11320 https://issues.apache.org/jira/browse/HADOOP-11320 to increase timeout for jenkins test of crossing-subproject patches? Thanks a lot, --Yongjun On Tue, Dec 2, 2014 at 10:10 AM, Yongjun Zhang yzh...@cloudera.com wrote: Hi, Thank you all for the input. https://issues.apache.org/jira/browse/HADOOP-11320 was created for this issue. Welcome to give your further comments there. Best, --Yongjun On Tue, Nov 25, 2014 at 10:26 PM, Colin McCabe cmcc...@alumni.cmu.edu wrote: +1 for increasing the test timeout for tests spanning multiple sub-projects. I can see the value in what Steve L. suggested... if you make a major change that touches a particular subproject, you should try to get the approval of a committer who knows that subproject. But I don't think that forcing artificial patch splits is the way to do this... There are also some patches that are completely mechanical and don't really require the involvement of YARN / HDFS committer, even if they change that project. For example, fixing a misspelling in the name of a hadoop-common API. Colin On Tue, Nov 25, 2014 at 8:45 AM, Yongjun Zhang yzh...@cloudera.com wrote: Thanks all for the feedback. To summarize (and I have a suggestion at the end of this email), there are two scenarios: 1. A change that span multiple *bigger* projects. r.g. hadoop, hbase. 2. A change that span multiple *sub* projects* within hadoop, e.g., common, hdfs, yarn For 1, it's required for the change to be backward compatible, thus splitting change for multiple *bigger* projects is a must. For 2, there are two sub types, - 2.1 those changes that can be made within hadoop sub-projects, and there is no external impact - 2.2 those changes that have external impact, that is, the changes involve adding new APIs and marking old API deprecated, and corresponding changes in other *bigger* projects will have to be made independently. *But the changes within hadoop subjects can still be done altogether.* I think (Please correct me if I'm wrong): - What Colin referred to is 2.1 and changes within hadoop sub-subjects for 2.2; - Steve's not for changes across hadoop-common and hdfs, or hadoop-common and yarn means 2.1, Steve's changes that only span hdfs-and-yarn would be fairly doubtful too. implies his doubt of existence of 2.1. For changes of 2.1 (if any) and *hadoop* changes of 2.2, we do have an option of making the change across all hadoop sub-projects altogether, to save the multiple steps Colin referred to. If this option is feasible, should we consider increasing the jenkins timeout for this kind of changes (I mean making the timeout adjustable, if it's for single sub-project, use the old timeout; otherwise, increase accordingly) so that we have at least this option when needed? Thanks. --Yongjun On Tue, Nov 25, 2014 at 2:28 AM, Steve Loughran ste...@hortonworks.com wrote: On 25 November 2014 at 00:58, Bernd Eckenfels e...@zusammenkunft.net wrote: Hello, Am Mon, 24 Nov 2014 16:16:00 -0800 schrieb Colin McCabe cmcc...@alumni.cmu.edu: Conceptually, I think it's important to support patches that modify multiple sub-projects. Otherwise refactoring things in common becomes a multi-step process. This might be rather philosophical (and I dont want to argue the need to have the patch infrastructure work for the multi-project case), howevere if a multi-project change cannot be applied in multiple steps it is probably also not safe at runtime (unless the multiple projects belong to a single instance/artifact). And then beeing forced to commit/compile/test in multiple steps actually increases the dependencies topology. +1 for changes that span, say hadoop and hbase. but not for changes across hadoop-common and hdfs, or hadoop-common and yarn. changes that only span hdfs-and-yarn would be fairly doubtful too. there is a dependency graph in hadoop's own jars —and cross module (not cross project) changes do need to happen. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this
Re: submitting a hadoop patch doesn't trigger jenkins test run
Sorry my bad, I named Andrew Wang for Andrew Bayer in my last mail, both of them helped anyways:-) So thanks to all for the help on this matter! --Yongjun On Thu, Dec 11, 2014 at 3:38 PM, Yongjun Zhang yzh...@cloudera.com wrote: Many thanks to Ted Yu, Steve Loughran and Andrew Wang for replying in the jira and Steve/Andrew for making the related changes! --Yongjun On Thu, Dec 11, 2014 at 12:41 PM, Yongjun Zhang yzh...@cloudera.com wrote: Hi, I wonder if anyone can help on resolving HADOOP-11320 https://issues.apache.org/jira/browse/HADOOP-11320 to increase timeout for jenkins test of crossing-subproject patches? Thanks a lot, --Yongjun On Tue, Dec 2, 2014 at 10:10 AM, Yongjun Zhang yzh...@cloudera.com wrote: Hi, Thank you all for the input. https://issues.apache.org/jira/browse/HADOOP-11320 was created for this issue. Welcome to give your further comments there. Best, --Yongjun On Tue, Nov 25, 2014 at 10:26 PM, Colin McCabe cmcc...@alumni.cmu.edu wrote: +1 for increasing the test timeout for tests spanning multiple sub-projects. I can see the value in what Steve L. suggested... if you make a major change that touches a particular subproject, you should try to get the approval of a committer who knows that subproject. But I don't think that forcing artificial patch splits is the way to do this... There are also some patches that are completely mechanical and don't really require the involvement of YARN / HDFS committer, even if they change that project. For example, fixing a misspelling in the name of a hadoop-common API. Colin On Tue, Nov 25, 2014 at 8:45 AM, Yongjun Zhang yzh...@cloudera.com wrote: Thanks all for the feedback. To summarize (and I have a suggestion at the end of this email), there are two scenarios: 1. A change that span multiple *bigger* projects. r.g. hadoop, hbase. 2. A change that span multiple *sub* projects* within hadoop, e.g., common, hdfs, yarn For 1, it's required for the change to be backward compatible, thus splitting change for multiple *bigger* projects is a must. For 2, there are two sub types, - 2.1 those changes that can be made within hadoop sub-projects, and there is no external impact - 2.2 those changes that have external impact, that is, the changes involve adding new APIs and marking old API deprecated, and corresponding changes in other *bigger* projects will have to be made independently. *But the changes within hadoop subjects can still be done altogether.* I think (Please correct me if I'm wrong): - What Colin referred to is 2.1 and changes within hadoop sub-subjects for 2.2; - Steve's not for changes across hadoop-common and hdfs, or hadoop-common and yarn means 2.1, Steve's changes that only span hdfs-and-yarn would be fairly doubtful too. implies his doubt of existence of 2.1. For changes of 2.1 (if any) and *hadoop* changes of 2.2, we do have an option of making the change across all hadoop sub-projects altogether, to save the multiple steps Colin referred to. If this option is feasible, should we consider increasing the jenkins timeout for this kind of changes (I mean making the timeout adjustable, if it's for single sub-project, use the old timeout; otherwise, increase accordingly) so that we have at least this option when needed? Thanks. --Yongjun On Tue, Nov 25, 2014 at 2:28 AM, Steve Loughran ste...@hortonworks.com wrote: On 25 November 2014 at 00:58, Bernd Eckenfels e...@zusammenkunft.net wrote: Hello, Am Mon, 24 Nov 2014 16:16:00 -0800 schrieb Colin McCabe cmcc...@alumni.cmu.edu: Conceptually, I think it's important to support patches that modify multiple sub-projects. Otherwise refactoring things in common becomes a multi-step process. This might be rather philosophical (and I dont want to argue the need to have the patch infrastructure work for the multi-project case), howevere if a multi-project change cannot be applied in multiple steps it is probably also not safe at runtime (unless the multiple projects belong to a single instance/artifact). And then beeing forced to commit/compile/test in multiple steps actually increases the dependencies topology. +1 for changes that span, say hadoop and hbase. but not for changes across hadoop-common and hdfs, or hadoop-common and yarn. changes that only span hdfs-and-yarn would be fairly doubtful too. there is a dependency graph in hadoop's own jars —and cross module (not cross project) changes do need to happen. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this
[jira] [Resolved] (HADOOP-11389) Clean up byte to string encoding issues in hadoop-common
[ https://issues.apache.org/jira/browse/HADOOP-11389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai resolved HADOOP-11389. - Resolution: Fixed I've committed the patch to trunk and branch-2. Clean up byte to string encoding issues in hadoop-common Key: HADOOP-11389 URL: https://issues.apache.org/jira/browse/HADOOP-11389 Project: Hadoop Common Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HADOOP-11389.000.patch, HADOOP-11389.001.patch Much code in hadoop-common convert bytes to string using default charsets. The behavior of conversion depends on the platform settings of encoding, which is flagged by newer versions of findbugs. This jira proposes to fix the findbugs warnings. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Solaris Port
So, turns out that if I had naively changed all calls to terror or references to sys_errlist, to using strerror_r, then I would have broken code for Windows and HPUX (and possibly other OSes). If we are to assume that current code runs fine on all platforms (maybe even AIX an MacOS, for example), then any change/additions made to the code and not ifdeffed appropriately can break on other OSes. On the other hand, too many ifdefs can pollute the code source and render it less readable (though possibly less important). In the general case what are code contributors responsibilities to adding code regarding OSes besides Linux ? What OSes does jenkins test on ? I guess maintainers of code on non-tested platforms are responsible for their own testing ? How do we avoid the ping-pong effect, i.e. I make a generic change to code which breaks on Windows, then the Windows maintainer reverts changes to break on Solaris for example ? Or does this not happen in actuality ? On 12/11/2014 11:25 PM, Asokan, M wrote: Hi Malcom, The Windows versions of strerror() and strerror_s() functions are probably meant for ANSI C library functions that set errno. For core Windows API calls (like UNIX system calls), one gets the error number by calling GetLastError() function. In the code snippet I sent earlier, the code argument is the value returned by GetLastError(). Neither strerror() nor strerror_s() will give the correct error message for this error code. You could probably look at libwinutils.c in Hadoop source. It uses FormatMessageW (which returns messages in Unicode.) My requirement was to return messages in current system locale. -- Asokan From: malcolm [malcolm.kaval...@oracle.com] Sent: Thursday, December 11, 2014 4:04 PM To: common-dev@hadoop.apache.org Subject: Re: Solaris Port Hi Asok, I googled and found that windows has strerror, and strerror_s (which is the strerror_r equivalent). Is there a reason why you didn't use this call ? On 12/11/2014 06:27 PM, Asokan, M wrote: Hi Malcom, Recently, I had to work on a function to get system error message on various systems. Here is the piece of code I came up with. Hope it helps. static void get_system_error_message(char * buf, int buf_len, int code) { #if defined(_WIN32) LPVOID lpMsgBuf; DWORD status = FormatMessage(FORMAT_MESSAGE_ALLOCATE_BUFFER | FORMAT_MESSAGE_FROM_SYSTEM | FORMAT_MESSAGE_IGNORE_INSERTS, NULL, code, MAKELANGID(LANG_NEUTRAL, SUBLANG_DEFAULT), /* Default language */ (LPTSTR) lpMsgBuf, 0, NULL); if (status 0) { strncpy(buf, (char *)lpMsgBuf, buf_len-1); buf[buf_len-1] = '\0'; /* Free the buffer returned by system */ LocalFree(lpMsgBuf); } else { _snprintf(buf, buf_len-1 , %s %d, Can't get system error message for code, code); buf[buf_len-1] = '\0'; } #else #if defined(_HPUX_SOURCE) { char * msg; errno = 0; msg = strerror(code); if (errno == 0) { strncpy(buf, msg, buf_len-1); buf[buf_len-1] = '\0'; } else { snprintf(buf, buf_len, %s %d, Can't get system error message for code, code); } } #else if (strerror_r(code, buf, buf_len) != 0) { snprintf(buf, buf_len, %s %d, Can't get system error message for code, code); } #endif #endif } Note that HPUX does not have strerror_r() since strerror() itself is thread-safe. Also Windows does not have snprintf(). The equivalent function _snprintf() has a subtle difference in its interface. -- Asokan From: malcolm [malcolm.kaval...@oracle.com] Sent: Thursday, December 11, 2014 11:02 AM To: common-dev@hadoop.apache.org Subject: Re: Solaris Port Fine with me, I volunteer to do this, if accepted. On 12/11/2014 05:48 PM, Allen Wittenauer wrote: sys_errlist was removed for a reason. Creating a fake sys_errlist on Solaris will mean the libhadoop.so will need to be tied a specific build (kernel/include pairing) and therefore limits upward mobility/compatibility. That doesn’t seem like a very good idea. IMO, switching to strerror_r is much preferred, since other than the brain-dead GNU libc version, is highly portable and should work regardless of the kernel or OS in place. On Dec 11, 2014, at 5:20 AM, malcolm malcolm.kaval...@oracle.com wrote: FYI, there are a couple more files that reference sys_errlist directly (not just terror within exception.c) , but also hdfs_http_client.c and NativeiO.c On 12/11/2014 07:38 AM, malcolm