Build failed in Jenkins: Hadoop-Common-trunk #1338

2014-12-11 Thread Apache Jenkins Server
See https://builds.apache.org/job/Hadoop-Common-trunk/1338/changes

Changes:

[wheat9] HADOOP-10482. Fix various findbugs warnings in hadoop-common. 
Contributed by Haohui Mai.

[wheat9] HADOOP-11388. Remove deprecated o.a.h.metrics.file.FileContext. 
Contributed by Li Lu.

[aw] HADOOP-10950. rework heap management vars (John Smith via aw)

[aw] HADOOP-6590. Add a username check for hadoop sub-commands (John Smith via 
aw)

[aw] YARN-2437. start-yarn.sh/stop-yarn should give info (Varun Saxena via aw)

[wheat9] HADOOP-11386. Replace \n by %n in format hadoop-common format strings. 
Contributed by Li Lu.

[wheat9] HDFS-5578. [JDK8] Fix Javadoc errors caused by incorrect or illegal 
tags in doc comments. Contributed by Andrew Purtell.

[arp] HDFS-7475. Make TestLazyPersistFiles#testLazyPersistBlocksAreSaved 
deterministic. (Contributed by Xiaoyu Yao)

[harsh] MAPREDUCE-5420. Remove mapreduce.task.tmp.dir from mapred-default.xml. 
Contributed by James Carman. (harsh)

[wheat9] HDFS-7463. Simplify FSNamesystem#getBlockLocationsUpdateTimes. 
Contributed by Haohui Mai.

[arp] HDFS-7503. Namenode restart after large deletions can cause slow 
processReport (Arpit Agarwal)

--
[...truncated 2956 lines...]
[INFO] Using default encoding to copy filtered resources.
[INFO] 
[INFO] --- maven-compiler-plugin:2.5.1:testCompile (default-testCompile) @ 
hadoop-sls ---
[INFO] Compiling 6 source files to 
https://builds.apache.org/job/Hadoop-Common-trunk/ws/hadoop-tools/hadoop-sls/target/test-classes
[INFO] 
[INFO] --- maven-surefire-plugin:2.17:test (default-test) @ hadoop-sls ---
[INFO] Tests are skipped.
[INFO] 
[INFO] --- maven-jar-plugin:2.3.1:jar (default-jar) @ hadoop-sls ---
[INFO] Building jar: 
https://builds.apache.org/job/Hadoop-Common-trunk/ws/hadoop-tools/hadoop-sls/target/hadoop-sls-3.0.0-SNAPSHOT.jar
[INFO] 
[INFO]  maven-source-plugin:2.1.2:jar (default) @ hadoop-sls 
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (create-testdirs) @ hadoop-sls ---
[INFO] Executing tasks

main:
[INFO] Executed tasks
[INFO] 
[INFO]  maven-source-plugin:2.1.2:jar (default) @ hadoop-sls 
[INFO] 
[INFO] --- maven-source-plugin:2.1.2:jar (default) @ hadoop-sls ---
[INFO] Building jar: 
https://builds.apache.org/job/Hadoop-Common-trunk/ws/hadoop-tools/hadoop-sls/target/hadoop-sls-3.0.0-SNAPSHOT-sources.jar
[INFO] 
[INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ 
hadoop-sls ---
[INFO] 
[INFO] --- maven-enforcer-plugin:1.3.1:enforce (depcheck) @ hadoop-sls ---
[INFO] 
[INFO] --- maven-install-plugin:2.3.1:install (default-install) @ hadoop-sls ---
[INFO] Installing 
https://builds.apache.org/job/Hadoop-Common-trunk/ws/hadoop-tools/hadoop-sls/target/hadoop-sls-3.0.0-SNAPSHOT.jar
 to 
/home/jenkins/.m2/repository/org/apache/hadoop/hadoop-sls/3.0.0-SNAPSHOT/hadoop-sls-3.0.0-SNAPSHOT.jar
[INFO] Installing 
https://builds.apache.org/job/Hadoop-Common-trunk/ws/hadoop-tools/hadoop-sls/pom.xml
 to 
/home/jenkins/.m2/repository/org/apache/hadoop/hadoop-sls/3.0.0-SNAPSHOT/hadoop-sls-3.0.0-SNAPSHOT.pom
[INFO] Installing 
https://builds.apache.org/job/Hadoop-Common-trunk/ws/hadoop-tools/hadoop-sls/target/hadoop-sls-3.0.0-SNAPSHOT-sources.jar
 to 
/home/jenkins/.m2/repository/org/apache/hadoop/hadoop-sls/3.0.0-SNAPSHOT/hadoop-sls-3.0.0-SNAPSHOT-sources.jar
[INFO] 
[INFO] 
[INFO] Building Apache Hadoop Tools Dist 3.0.0-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-clean-plugin:2.4.1:clean (default-clean) @ hadoop-tools-dist 
---
[INFO] Deleting 
https://builds.apache.org/job/Hadoop-Common-trunk/ws/hadoop-tools/hadoop-tools-dist/target
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (create-testdirs) @ hadoop-tools-dist ---
[INFO] Executing tasks

main:
[mkdir] Created dir: 
https://builds.apache.org/job/Hadoop-Common-trunk/ws/hadoop-tools/hadoop-tools-dist/target/test-dir
[mkdir] Created dir: 
https://builds.apache.org/job/Hadoop-Common-trunk/ws/hadoop-tools/hadoop-tools-dist/target/test/data
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-resources-plugin:2.2:resources (default-resources) @ 
hadoop-tools-dist ---
[INFO] Using default encoding to copy filtered resources.
[INFO] 
[INFO] --- maven-compiler-plugin:2.5.1:compile (default-compile) @ 
hadoop-tools-dist ---
[INFO] No sources to compile
[INFO] 
[INFO] --- maven-resources-plugin:2.2:testResources (default-testResources) @ 
hadoop-tools-dist ---
[INFO] Using default encoding to copy filtered resources.
[INFO] 
[INFO] --- maven-compiler-plugin:2.5.1:testCompile (default-testCompile) @ 
hadoop-tools-dist ---
[INFO] No sources to compile
[INFO] 
[INFO] --- maven-surefire-plugin:2.17:test (default-test) @ hadoop-tools-dist 
---
[INFO] Tests are skipped.
[INFO] 
[INFO] --- maven-jar-plugin:2.3.1:jar (prepare-jar) @ 

[jira] [Created] (HADOOP-11390) Metrics 2 ganglia provider to include hostname in unresolved address problems

2014-12-11 Thread Steve Loughran (JIRA)
Steve Loughran created HADOOP-11390:
---

 Summary: Metrics 2 ganglia provider to include hostname in 
unresolved address problems
 Key: HADOOP-11390
 URL: https://issues.apache.org/jira/browse/HADOOP-11390
 Project: Hadoop Common
  Issue Type: Improvement
  Components: metrics
Affects Versions: 2.6.0
Reporter: Steve Loughran
Priority: Minor


When metrics2/ganglia gets an unresolved hostname it doesn't include the 
hostname in question, making it harder to track down




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Hadoop without HDFS

2014-12-11 Thread Steve Loughran
one more thing, the if excludes object stores which don't offer
consistency and atomic create-no-overwrite and rename. You can't run all
hadoop apps directly on top of Amazon S3, without extra work (see netflix
S3mper). Object stores do not always behave as filesystems, even if they
implement the relevant Hadoop APIs (some do though, like google's and
microsoft's)

HADOOP-9361 and the filesystem documentation attempt to formally specify
what an FS should do;

http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/index.html
Where formally means try to rigorously define what HDFS does and how
other filesystems (especially posix ones) differ


HADOOP-9565 looking at some explicit ObjectStore subclass of FileSystem to
provide more details on object stores

On 10 December 2014 at 20:20, Ari King ari.brandeis.k...@gmail.com wrote:

 Hi,

 I'm doing a research paper on Hadoop -- specifically relating to its
 dependency on HDFS. I need to determine if and how HDFS can be replaced. As
 I understand it, there are a number of organizations that have produced
 HDFS alternatives that support the Hadoop ecosystem, i.e. MapReduce, Hive,
 HBase, etc.

 With the if part being answered, I'd appreciate insight/guidance on the
 how part. Essentially, where can I find information on what MapReduce and
 the other Hadoop subprojects require of the underlying file system and how
 these subprojects expect to interact with the file system.

 Thanks!

 Best,
 Ari


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Build failed in Jenkins: Hadoop-Common-trunk #1339

2014-12-11 Thread Apache Jenkins Server
See https://builds.apache.org/job/Hadoop-Common-trunk/1339/

--
[...truncated 2944 lines...]
[INFO] 
[INFO] --- maven-compiler-plugin:2.5.1:testCompile (default-testCompile) @ 
hadoop-sls ---
[INFO] Compiling 6 source files to 
https://builds.apache.org/job/Hadoop-Common-trunk/ws/hadoop-tools/hadoop-sls/target/test-classes
[INFO] 
[INFO] --- maven-surefire-plugin:2.17:test (default-test) @ hadoop-sls ---
[INFO] Tests are skipped.
[INFO] 
[INFO] --- maven-jar-plugin:2.3.1:jar (default-jar) @ hadoop-sls ---
[INFO] Building jar: 
https://builds.apache.org/job/Hadoop-Common-trunk/ws/hadoop-tools/hadoop-sls/target/hadoop-sls-3.0.0-SNAPSHOT.jar
[INFO] 
[INFO]  maven-source-plugin:2.1.2:jar (default) @ hadoop-sls 
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (create-testdirs) @ hadoop-sls ---
[INFO] Executing tasks

main:
[INFO] Executed tasks
[INFO] 
[INFO]  maven-source-plugin:2.1.2:jar (default) @ hadoop-sls 
[INFO] 
[INFO] --- maven-source-plugin:2.1.2:jar (default) @ hadoop-sls ---
[INFO] Building jar: 
https://builds.apache.org/job/Hadoop-Common-trunk/ws/hadoop-tools/hadoop-sls/target/hadoop-sls-3.0.0-SNAPSHOT-sources.jar
[INFO] 
[INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ 
hadoop-sls ---
[INFO] 
[INFO] --- maven-enforcer-plugin:1.3.1:enforce (depcheck) @ hadoop-sls ---
[INFO] 
[INFO] --- maven-install-plugin:2.3.1:install (default-install) @ hadoop-sls ---
[INFO] Installing 
https://builds.apache.org/job/Hadoop-Common-trunk/ws/hadoop-tools/hadoop-sls/target/hadoop-sls-3.0.0-SNAPSHOT.jar
 to 
/home/jenkins/.m2/repository/org/apache/hadoop/hadoop-sls/3.0.0-SNAPSHOT/hadoop-sls-3.0.0-SNAPSHOT.jar
[INFO] Installing 
https://builds.apache.org/job/Hadoop-Common-trunk/ws/hadoop-tools/hadoop-sls/pom.xml
 to 
/home/jenkins/.m2/repository/org/apache/hadoop/hadoop-sls/3.0.0-SNAPSHOT/hadoop-sls-3.0.0-SNAPSHOT.pom
[INFO] Installing 
https://builds.apache.org/job/Hadoop-Common-trunk/ws/hadoop-tools/hadoop-sls/target/hadoop-sls-3.0.0-SNAPSHOT-sources.jar
 to 
/home/jenkins/.m2/repository/org/apache/hadoop/hadoop-sls/3.0.0-SNAPSHOT/hadoop-sls-3.0.0-SNAPSHOT-sources.jar
[INFO] 
[INFO] 
[INFO] Building Apache Hadoop Tools Dist 3.0.0-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-clean-plugin:2.4.1:clean (default-clean) @ hadoop-tools-dist 
---
[INFO] Deleting 
https://builds.apache.org/job/Hadoop-Common-trunk/ws/hadoop-tools/hadoop-tools-dist/target
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (create-testdirs) @ hadoop-tools-dist ---
[INFO] Executing tasks

main:
[mkdir] Created dir: 
https://builds.apache.org/job/Hadoop-Common-trunk/ws/hadoop-tools/hadoop-tools-dist/target/test-dir
[mkdir] Created dir: 
https://builds.apache.org/job/Hadoop-Common-trunk/ws/hadoop-tools/hadoop-tools-dist/target/test/data
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-resources-plugin:2.2:resources (default-resources) @ 
hadoop-tools-dist ---
[INFO] Using default encoding to copy filtered resources.
[INFO] 
[INFO] --- maven-compiler-plugin:2.5.1:compile (default-compile) @ 
hadoop-tools-dist ---
[INFO] No sources to compile
[INFO] 
[INFO] --- maven-resources-plugin:2.2:testResources (default-testResources) @ 
hadoop-tools-dist ---
[INFO] Using default encoding to copy filtered resources.
[INFO] 
[INFO] --- maven-compiler-plugin:2.5.1:testCompile (default-testCompile) @ 
hadoop-tools-dist ---
[INFO] No sources to compile
[INFO] 
[INFO] --- maven-surefire-plugin:2.17:test (default-test) @ hadoop-tools-dist 
---
[INFO] Tests are skipped.
[INFO] 
[INFO] --- maven-jar-plugin:2.3.1:jar (prepare-jar) @ hadoop-tools-dist ---
[WARNING] JAR will be empty - no content was marked for inclusion!
[INFO] Building jar: 
https://builds.apache.org/job/Hadoop-Common-trunk/ws/hadoop-tools/hadoop-tools-dist/target/hadoop-tools-dist-3.0.0-SNAPSHOT.jar
[INFO] 
[INFO] --- maven-jar-plugin:2.3.1:test-jar (prepare-test-jar) @ 
hadoop-tools-dist ---
[WARNING] JAR will be empty - no content was marked for inclusion!
[INFO] Building jar: 
https://builds.apache.org/job/Hadoop-Common-trunk/ws/hadoop-tools/hadoop-tools-dist/target/hadoop-tools-dist-3.0.0-SNAPSHOT-tests.jar
[INFO] 
[INFO]  maven-source-plugin:2.1.2:jar (default) @ hadoop-tools-dist 
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (create-testdirs) @ hadoop-tools-dist ---
[INFO] Executing tasks

main:
[INFO] Executed tasks
[INFO] 
[INFO]  maven-source-plugin:2.1.2:jar (default) @ hadoop-tools-dist 
[INFO] 
[INFO] --- maven-source-plugin:2.1.2:jar (default) @ hadoop-tools-dist ---
[INFO] No sources in project. Archive not created.
[INFO] 
[INFO]  maven-source-plugin:2.1.2:test-jar (default) @ hadoop-tools-dist 
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (create-testdirs) @ hadoop-tools-dist ---
[INFO] Executing tasks


Re: Solaris Port

2014-12-11 Thread malcolm
FYI, there are a couple more files that reference sys_errlist directly 
(not just terror within exception.c) , but also hdfs_http_client.c and 
NativeiO.c


On 12/11/2014 07:38 AM, malcolm wrote:

Hi Colin,

Exactly, as you noticed, the problem is the thread-local buffer needed 
to return from terror.
Currently, terror just returns a static string from an array, this is 
fast, simple and error-proof.


In order to use strerror_r inside terror,  would require allocating a 
buffer inside terror  and depend on the caller to free the buffer 
after using it, or to pass a buffer to terrror (which is basically the 
same as strerror_r, rendering terror redundant).
Both cases require modification outside terror itself, as far as I can 
tell, no simple fix. Unless you have an alternative which I haven't 
thought of ?


As far as I can tell, we have two choices:

1. Remove terror and replace calls with strerror_r, passing a buffer 
from the callee.

Advantage: a more modern portable interface.
Disadvantage: All calls to terror need to be modified, though all 
seem to be in a few files as far as I can tell.


2. Adding a sys_errlist array (ifdeffed for Solaris)
Advantage: no change to any calls to terror
Disadvantage: 2 additional files added to source tree (.c and .h) 
and some minor ifdefs only used for Solaris.


I think it is more a question of style than anything else, so I leave 
you to make the call.


Thanks for your patience,
Malcolm





On 12/10/2014 09:54 PM, Colin McCabe wrote:
On Wed, Dec 10, 2014 at 2:31 AM, malcolm 
malcolm.kaval...@oracle.com wrote:

Hi Colin,

Thanks for the hints around JIRAs.

You are correct errno still exists, however sys_errlist does not.

Hadoop uses a function terror (defined in exception.c) which indexes
sys_errlist by errno to return the error message from the array. This
function is called 26 times in various places (in 2.2)

Originally, I thought to replace all calls to terror with strerror, but
there can be issues with multi-threading (it returns a buffer which 
can be
overwritten), so it seemed simpler just to recreate the sys_errlist 
message

array.

There is also a multi-threaded version strerror_r where you pass the 
buffer
as a parameter, but this would necessitate changing every call to 
terror

with mutiple lines of code.

Why don't you just use strerror_r inside terror()?

I wrote that code originally.  The reason I didn't want to use
strerror_r there is because GNU libc provides a non-POSIX definition
of strerror_r, and forcing it to use the POSIX one is a pain. But you
can do it.  You also will require a thread-local buffer to hold the
return from strerror_r, since it is not guaranteed to be static
(although in practice it is 99% of the time-- another annoyance with
the API).








Jenkins build is back to normal : Hadoop-Common-trunk #1340

2014-12-11 Thread Apache Jenkins Server
See https://builds.apache.org/job/Hadoop-Common-trunk/1340/



Re: Solaris Port

2014-12-11 Thread Allen Wittenauer

sys_errlist was removed for a reason.  Creating a fake sys_errlist on Solaris 
will mean the libhadoop.so will need to be tied a specific build 
(kernel/include pairing) and therefore limits upward mobility/compatibility.  
That doesn’t seem like a very good idea.  

IMO, switching to strerror_r is much preferred, since other than the brain-dead 
GNU libc version, is highly portable and should work regardless of the kernel 
or OS in place.

On Dec 11, 2014, at 5:20 AM, malcolm malcolm.kaval...@oracle.com wrote:

 FYI, there are a couple more files that reference sys_errlist directly (not 
 just terror within exception.c) , but also hdfs_http_client.c and NativeiO.c
 
 On 12/11/2014 07:38 AM, malcolm wrote:
 Hi Colin,
 
 Exactly, as you noticed, the problem is the thread-local buffer needed to 
 return from terror.
 Currently, terror just returns a static string from an array, this is fast, 
 simple and error-proof.
 
 In order to use strerror_r inside terror,  would require allocating a buffer 
 inside terror  and depend on the caller to free the buffer after using it, 
 or to pass a buffer to terrror (which is basically the same as strerror_r, 
 rendering terror redundant).
 Both cases require modification outside terror itself, as far as I can tell, 
 no simple fix. Unless you have an alternative which I haven't thought of ?
 
 As far as I can tell, we have two choices:
 
 1. Remove terror and replace calls with strerror_r, passing a buffer from 
 the callee.
Advantage: a more modern portable interface.
Disadvantage: All calls to terror need to be modified, though all seem to 
 be in a few files as far as I can tell.
 
 2. Adding a sys_errlist array (ifdeffed for Solaris)
Advantage: no change to any calls to terror
Disadvantage: 2 additional files added to source tree (.c and .h) and 
 some minor ifdefs only used for Solaris.
 
 I think it is more a question of style than anything else, so I leave you to 
 make the call.
 
 Thanks for your patience,
 Malcolm
 
 
 
 
 
 On 12/10/2014 09:54 PM, Colin McCabe wrote:
 On Wed, Dec 10, 2014 at 2:31 AM, malcolm malcolm.kaval...@oracle.com 
 wrote:
 Hi Colin,
 
 Thanks for the hints around JIRAs.
 
 You are correct errno still exists, however sys_errlist does not.
 
 Hadoop uses a function terror (defined in exception.c) which indexes
 sys_errlist by errno to return the error message from the array. This
 function is called 26 times in various places (in 2.2)
 
 Originally, I thought to replace all calls to terror with strerror, but
 there can be issues with multi-threading (it returns a buffer which can be
 overwritten), so it seemed simpler just to recreate the sys_errlist message
 array.
 
 There is also a multi-threaded version strerror_r where you pass the buffer
 as a parameter, but this would necessitate changing every call to terror
 with mutiple lines of code.
 Why don't you just use strerror_r inside terror()?
 
 I wrote that code originally.  The reason I didn't want to use
 strerror_r there is because GNU libc provides a non-POSIX definition
 of strerror_r, and forcing it to use the POSIX one is a pain. But you
 can do it.  You also will require a thread-local buffer to hold the
 return from strerror_r, since it is not guaranteed to be static
 (although in practice it is 99% of the time-- another annoyance with
 the API).
 
 
 
 



[jira] [Created] (HADOOP-11391) Enabling HVE/node awareness does not rebalance replicas on data that existed prior to topology changes.

2014-12-11 Thread ellen johansen (JIRA)
ellen johansen created HADOOP-11391:
---

 Summary: Enabling HVE/node awareness does not rebalance replicas 
on data that existed prior to topology changes. 
 Key: HADOOP-11391
 URL: https://issues.apache.org/jira/browse/HADOOP-11391
 Project: Hadoop Common
  Issue Type: Bug
 Environment: VMWare w/ local storage
Reporter: ellen johansen


Enabling HVE/node awareness does not rebalance replicas on data that existed 
prior to topology changes. 

[root@vmw-d10-001 jenkins]# more /opt/cloudera/topology.data 
10.20.xxx.161   /rack1/nodegroup1
10.20.xxx.162   /rack1/nodegroup1
10.20.xxx.163   /rack3/nodegroup1
10.20.xxx.164   /rack3/nodegroup1
172.17.xxx.71   /rack2/nodegroup1
172.17.xxx.72   /rack2/nodegroup1

before HVE:
/user/impalauser/tpcds/store_sales dir
/user/impalauser/tpcds/store_sales/store_sales.dat 1180463121 bytes, 9 
block(s):  OK
0. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742xxx_1382 len=134217728 
repl=3 [10.20.xxx.164:20002, 10.20.xxx.161:20002, 10.20.xxx.163:20002]
1. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742213_1389 len=134217728 
repl=3 [10.20.xxx.164:20002, 172.17.xxx.72:20002, 10.20.xxx.161:20002]
2. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742214_1390 len=134217728 
repl=3 [10.20.xxx.164:20002, 172.17.xxx.72:20002, 10.20.xxx.163:20002]
3. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742215_1391 len=134217728 
repl=3 [10.20.xxx.164:20002, 172.17.xxx.72:20002, 10.20.xxx.163:20002]
4. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742216_1392 len=134217728 
repl=3 [10.20.xxx.164:20002, 10.20.xxx.161:20002, 172.17.xxx.72:20002]
5. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742217_1393 len=134217728 
repl=3 [10.20.xxx.164:20002, 172.17.xxx.72:20002, 10.20.xxx.163:20002]
6. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742220_1396 len=134217728 
repl=3 [10.20.xxx.164:20002, 10.20.xxx.162:20002, 10.20.xxx.163:20002]
7. BP-1184748135-172.17.xxx.71-1418235396548:blk_107374_1398 len=134217728 
repl=3 [10.20.xxx.164:20002, 10.20.xxx.163:20002, 10.20.xxx.161:20002]
8. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742224_1400 len=106721297 
repl=3 [10.20.xxx.164:20002, 10.20.xxx.162:20002, 172.17.xxx.72:20002]
-

Before enabling HVE:
Status: HEALTHY
 Total size:1648156454 B (Total open files size: 498 B)
 Total dirs:138
 Total files:   384
 Total symlinks:0 (Files currently being written: 6)
 Total blocks (validated):  390 (avg. block size 4226042 B) (Total open 
file blocks (not validated): 6)
 Minimally replicated blocks:   390 (100.0 %)
 Over-replicated blocks:0 (0.0 %)
 Under-replicated blocks:   1 (0.25641027 %)
 Mis-replicated blocks: 0 (0.0 %)
 Default replication factor:3
 Average block replication: 2.8564103
 Corrupt blocks:0
 Missing replicas:  5 (0.44682753 %)
 Number of data-nodes:  5
 Number of racks:   1
FSCK ended at Wed Dec 10 14:04:35 EST 2014 in 50 milliseconds

The filesystem under path '/' is HEALTHY

--
after HVE (and NN restart):

/user/impalauser/tpcds/store_sales dir
/user/impalauser/tpcds/store_sales/store_sales.dat 1180463121 bytes, 9 
block(s):  OK
0. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742xxx_1382 len=134217728 
repl=3 [10.20.xxx.164:20002, 10.20.xxx.163:20002, 10.20.xxx.161:20002]
1. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742213_1389 len=134217728 
repl=3 [172.17.xxx.72:20002, 10.20.xxx.164:20002, 10.20.xxx.161:20002]
2. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742214_1390 len=134217728 
repl=3 [172.17.xxx.72:20002, 10.20.xxx.164:20002, 10.20.xxx.163:20002]
3. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742215_1391 len=134217728 
repl=3 [172.17.xxx.72:20002, 10.20.xxx.164:20002, 10.20.xxx.163:20002]
4. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742216_1392 len=134217728 
repl=3 [172.17.xxx.72:20002, 10.20.xxx.164:20002, 10.20.xxx.161:20002]
5. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742217_1393 len=134217728 
repl=3 [172.17.xxx.72:20002, 10.20.xxx.164:20002, 10.20.xxx.163:20002]
6. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742220_1396 len=134217728 
repl=3 [10.20.xxx.164:20002, 10.20.xxx.163:20002, 10.20.xxx.162:20002]
7. BP-1184748135-172.17.xxx.71-1418235396548:blk_107374_1398 len=134217728 
repl=3 [10.20.xxx.164:20002, 10.20.xxx.163:20002, 10.20.xxx.161:20002]
8. BP-1184748135-172.17.xxx.71-1418235396548:blk_1073742224_1400 len=106721297 
repl=3 [172.17.xxx.72:20002, 10.20.xxx.164:20002, 10.20.xxx.162:20002]

Status: HEALTHY
 Total size:1659427036 B (Total open files size: 498 B)
 Total dirs:176
 Total files:   529
 Total symlinks:0 (Files currently being written: 6)
 Total blocks (validated):  532 (avg. block size 3119223 B) (Total open 
file blocks (not validated): 6)
 Minimally 

Re: Solaris Port

2014-12-11 Thread malcolm

Fine with me, I volunteer to do this, if accepted.

On 12/11/2014 05:48 PM, Allen Wittenauer wrote:

sys_errlist was removed for a reason.  Creating a fake sys_errlist on Solaris 
will mean the libhadoop.so will need to be tied a specific build 
(kernel/include pairing) and therefore limits upward mobility/compatibility.  
That doesn’t seem like a very good idea.

IMO, switching to strerror_r is much preferred, since other than the brain-dead 
GNU libc version, is highly portable and should work regardless of the kernel 
or OS in place.

On Dec 11, 2014, at 5:20 AM, malcolm malcolm.kaval...@oracle.com wrote:


FYI, there are a couple more files that reference sys_errlist directly (not 
just terror within exception.c) , but also hdfs_http_client.c and NativeiO.c

On 12/11/2014 07:38 AM, malcolm wrote:

Hi Colin,

Exactly, as you noticed, the problem is the thread-local buffer needed to 
return from terror.
Currently, terror just returns a static string from an array, this is fast, 
simple and error-proof.

In order to use strerror_r inside terror,  would require allocating a buffer 
inside terror  and depend on the caller to free the buffer after using it, or 
to pass a buffer to terrror (which is basically the same as strerror_r, 
rendering terror redundant).
Both cases require modification outside terror itself, as far as I can tell, no 
simple fix. Unless you have an alternative which I haven't thought of ?

As far as I can tell, we have two choices:

1. Remove terror and replace calls with strerror_r, passing a buffer from the 
callee.
Advantage: a more modern portable interface.
Disadvantage: All calls to terror need to be modified, though all seem to 
be in a few files as far as I can tell.

2. Adding a sys_errlist array (ifdeffed for Solaris)
Advantage: no change to any calls to terror
Disadvantage: 2 additional files added to source tree (.c and .h) and some 
minor ifdefs only used for Solaris.

I think it is more a question of style than anything else, so I leave you to 
make the call.

Thanks for your patience,
Malcolm





On 12/10/2014 09:54 PM, Colin McCabe wrote:

On Wed, Dec 10, 2014 at 2:31 AM, malcolm malcolm.kaval...@oracle.com wrote:

Hi Colin,

Thanks for the hints around JIRAs.

You are correct errno still exists, however sys_errlist does not.

Hadoop uses a function terror (defined in exception.c) which indexes
sys_errlist by errno to return the error message from the array. This
function is called 26 times in various places (in 2.2)

Originally, I thought to replace all calls to terror with strerror, but
there can be issues with multi-threading (it returns a buffer which can be
overwritten), so it seemed simpler just to recreate the sys_errlist message
array.

There is also a multi-threaded version strerror_r where you pass the buffer
as a parameter, but this would necessitate changing every call to terror
with mutiple lines of code.

Why don't you just use strerror_r inside terror()?

I wrote that code originally.  The reason I didn't want to use
strerror_r there is because GNU libc provides a non-POSIX definition
of strerror_r, and forcing it to use the POSIX one is a pain. But you
can do it.  You also will require a thread-local buffer to hold the
return from strerror_r, since it is not guaranteed to be static
(although in practice it is 99% of the time-- another annoyance with
the API).






RE: Solaris Port

2014-12-11 Thread Asokan, M
Hi Malcom,
   Recently, I had to work on a function to get system error message on various 
systems.  Here is the piece of code I came up with.  Hope it helps.

static void get_system_error_message(char * buf, int buf_len, int code)
{
#if defined(_WIN32)
LPVOID lpMsgBuf;
DWORD status = FormatMessage(FORMAT_MESSAGE_ALLOCATE_BUFFER |
 FORMAT_MESSAGE_FROM_SYSTEM |
 FORMAT_MESSAGE_IGNORE_INSERTS,
 NULL, code,
 MAKELANGID(LANG_NEUTRAL, SUBLANG_DEFAULT),
 /* Default language */
 (LPTSTR) lpMsgBuf, 0, NULL);
if (status  0)
{
strncpy(buf, (char *)lpMsgBuf, buf_len-1);
buf[buf_len-1] = '\0';
/* Free the buffer returned by system */
LocalFree(lpMsgBuf);
}
else
{
_snprintf(buf, buf_len-1 , %s %d,
Can't get system error message for code, code);
buf[buf_len-1] = '\0';
}
#else
#if defined(_HPUX_SOURCE)
{
char * msg;
errno = 0;
msg = strerror(code);
if (errno == 0)
{
strncpy(buf, msg, buf_len-1);
buf[buf_len-1] = '\0';
}
else
{
snprintf(buf, buf_len, %s %d,
Can't get system error message for code, code);
}
}
#else
if (strerror_r(code, buf, buf_len) != 0)
{
snprintf(buf, buf_len, %s %d,
Can't get system error message for code, code);
}
#endif
#endif
}

Note that HPUX does not have strerror_r() since strerror() itself is 
thread-safe.  Also Windows does not have snprintf().  The equivalent function 
_snprintf() has a subtle difference in its interface.

-- Asokan

From: malcolm [malcolm.kaval...@oracle.com]
Sent: Thursday, December 11, 2014 11:02 AM
To: common-dev@hadoop.apache.org
Subject: Re: Solaris Port

Fine with me, I volunteer to do this, if accepted.

On 12/11/2014 05:48 PM, Allen Wittenauer wrote:
 sys_errlist was removed for a reason.  Creating a fake sys_errlist on Solaris 
 will mean the libhadoop.so will need to be tied a specific build 
 (kernel/include pairing) and therefore limits upward mobility/compatibility.  
 That doesn’t seem like a very good idea.

 IMO, switching to strerror_r is much preferred, since other than the 
 brain-dead GNU libc version, is highly portable and should work regardless of 
 the kernel or OS in place.

 On Dec 11, 2014, at 5:20 AM, malcolm malcolm.kaval...@oracle.com wrote:

 FYI, there are a couple more files that reference sys_errlist directly (not 
 just terror within exception.c) , but also hdfs_http_client.c and NativeiO.c

 On 12/11/2014 07:38 AM, malcolm wrote:
 Hi Colin,

 Exactly, as you noticed, the problem is the thread-local buffer needed to 
 return from terror.
 Currently, terror just returns a static string from an array, this is fast, 
 simple and error-proof.

 In order to use strerror_r inside terror,  would require allocating a 
 buffer inside terror  and depend on the caller to free the buffer after 
 using it, or to pass a buffer to terrror (which is basically the same as 
 strerror_r, rendering terror redundant).
 Both cases require modification outside terror itself, as far as I can 
 tell, no simple fix. Unless you have an alternative which I haven't thought 
 of ?

 As far as I can tell, we have two choices:

 1. Remove terror and replace calls with strerror_r, passing a buffer from 
 the callee.
 Advantage: a more modern portable interface.
 Disadvantage: All calls to terror need to be modified, though all seem 
 to be in a few files as far as I can tell.

 2. Adding a sys_errlist array (ifdeffed for Solaris)
 Advantage: no change to any calls to terror
 Disadvantage: 2 additional files added to source tree (.c and .h) and 
 some minor ifdefs only used for Solaris.

 I think it is more a question of style than anything else, so I leave you 
 to make the call.

 Thanks for your patience,
 Malcolm





 On 12/10/2014 09:54 PM, Colin McCabe wrote:
 On Wed, Dec 10, 2014 at 2:31 AM, malcolm malcolm.kaval...@oracle.com 
 wrote:
 Hi Colin,

 Thanks for the hints around JIRAs.

 You are correct errno still exists, however sys_errlist does not.

 Hadoop uses a function terror (defined in exception.c) which indexes
 sys_errlist by errno to return the error message from the array. This
 function is called 26 times in various places (in 2.2)

 Originally, I thought to replace all calls to terror with strerror, but
 there can be issues with multi-threading (it returns a buffer which can be
 overwritten), so it seemed simpler just to recreate the sys_errlist 
 message
 array.

 There is also a multi-threaded version strerror_r where you pass the 
 buffer
 as a parameter, but this would necessitate changing every call to terror
 with mutiple lines 

[jira] [Created] (HADOOP-11392) FileUtil.java leaks file descriptor when copybytes success.

2014-12-11 Thread Brahma Reddy Battula (JIRA)
Brahma Reddy Battula created HADOOP-11392:
-

 Summary: FileUtil.java leaks file descriptor when copybytes 
success.
 Key: HADOOP-11392
 URL: https://issues.apache.org/jira/browse/HADOOP-11392
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Brahma Reddy Battula


 Please check following code for same..
{code}
try {
in = srcFS.open(src);
out = dstFS.create(dst, overwrite);
IOUtils.copyBytes(in, out, conf, true);
  } catch (IOException e) {
IOUtils.closeStream(out);
IOUtils.closeStream(in);
throw e;
  }
}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11393) Revert HADOOP_PREFIX, go back to HADOOP_HOME

2014-12-11 Thread Allen Wittenauer (JIRA)
Allen Wittenauer created HADOOP-11393:
-

 Summary: Revert HADOOP_PREFIX, go back to HADOOP_HOME
 Key: HADOOP-11393
 URL: https://issues.apache.org/jira/browse/HADOOP-11393
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Allen Wittenauer


Today, Windows and parts of the Hadoop source code still use HADOOP_HOME.  The 
switch to HADOOP_PREFIX back in 0.21 or so didn't really accomplish what it was 
intended to do and only helped confuse the situation.

_HOME is a much more standard suffix and is, in fact, used for everything in 
Hadoop except for the top level project home.  I think it would be beneficial 
to use HADOOP_HOME in the shell code as the Official(tm) variable, still 
honoring HADOOP_PREFIX if it is set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11394) hadoop-aws documentation missing.

2014-12-11 Thread Chris Nauroth (JIRA)
Chris Nauroth created HADOOP-11394:
--

 Summary: hadoop-aws documentation missing.
 Key: HADOOP-11394
 URL: https://issues.apache.org/jira/browse/HADOOP-11394
 Project: Hadoop Common
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.7.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth


In HADOOP-10714, the documentation source files for hadoop-aws were moved from 
src/site to src/main/site.  The build is no longer actually generating the HTML 
site from these source files, because src/site is the expected path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11395) Add site documentation for Azure Storage FileSystem integration.

2014-12-11 Thread Chris Nauroth (JIRA)
Chris Nauroth created HADOOP-11395:
--

 Summary: Add site documentation for Azure Storage FileSystem 
integration.
 Key: HADOOP-11395
 URL: https://issues.apache.org/jira/browse/HADOOP-11395
 Project: Hadoop Common
  Issue Type: Improvement
  Components: documentation
Reporter: Chris Nauroth
Assignee: Chris Nauroth


The scope of this issue is to add site documentation covering our Azure Storage 
FileSystem integration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11396) Provide navigation in the site documentation linking to the Hadoop Compatible File Systems.

2014-12-11 Thread Chris Nauroth (JIRA)
Chris Nauroth created HADOOP-11396:
--

 Summary: Provide navigation in the site documentation linking to 
the Hadoop Compatible File Systems.
 Key: HADOOP-11396
 URL: https://issues.apache.org/jira/browse/HADOOP-11396
 Project: Hadoop Common
  Issue Type: Improvement
  Components: documentation
Reporter: Chris Nauroth


We build site documentation for hadoop-aws and hadoop-openstack, and we'll soon 
have documentation for hadoop-azure.  This documentation is not linked from the 
main site though, so unless a user knows the direct URL, they won't be able to 
find it.  This issue proposes adding navigation to the site to make it easier 
to find these documents.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: submitting a hadoop patch doesn't trigger jenkins test run

2014-12-11 Thread Yongjun Zhang
Hi,

I wonder if anyone can help on resolving HADOOP-11320
https://issues.apache.org/jira/browse/HADOOP-11320 to increase timeout
for jenkins test of crossing-subproject patches?

Thanks a lot,

--Yongjun

On Tue, Dec 2, 2014 at 10:10 AM, Yongjun Zhang yzh...@cloudera.com wrote:

 Hi,

 Thank you all for the input.

 https://issues.apache.org/jira/browse/HADOOP-11320

 was created for this issue. Welcome to give your further comments there.

 Best,

 --Yongjun

 On Tue, Nov 25, 2014 at 10:26 PM, Colin McCabe cmcc...@alumni.cmu.edu
 wrote:

 +1 for increasing the test timeout for tests spanning multiple
 sub-projects.

 I can see the value in what Steve L. suggested... if you make a major
 change that touches a particular subproject, you should try to get the
 approval of a committer who knows that subproject.  But I don't think that
 forcing artificial patch splits is the way to do this...  There are also
 some patches that are completely mechanical and don't really require the
 involvement of YARN / HDFS committer, even if they change that project.
 For example, fixing a misspelling in the name of a hadoop-common API.

 Colin

 On Tue, Nov 25, 2014 at 8:45 AM, Yongjun Zhang yzh...@cloudera.com
 wrote:

  Thanks all for the feedback. To summarize (and I have a suggestion at
 the
  end of this email), there are two scenarios:
 
 1. A change that span multiple *bigger* projects. r.g. hadoop, hbase.
 2. A change that span multiple *sub* projects* within hadoop, e.g.,
 common, hdfs, yarn
 
  For 1, it's required for the change to be backward compatible, thus
  splitting change for multiple *bigger* projects is a must.
 
  For 2, there are two sub types,
 
 - 2.1 those changes that can be made within hadoop sub-projects, and
 there is no external impact
 - 2.2 those changes that have external impact, that is, the changes
 involve adding new APIs and marking old API deprecated, and
  corresponding
 changes in other *bigger* projects will have to be made
 independently.
  *But
 the changes within hadoop subjects can still be done altogether.*
 
  I think (Please correct me if I'm wrong):
 
 - What Colin referred to is 2.1 and changes within hadoop
 sub-subjects
 for 2.2;
 - Steve's not for changes across hadoop-common and hdfs, or
 hadoop-common and yarn means 2.1, Steve's  changes that only
 span hdfs-and-yarn would be fairly doubtful too. implies his doubt
 of
 existence of 2.1.
 
  For changes of 2.1 (if any) and *hadoop* changes of 2.2, we do have an
  option of making the change across all hadoop sub-projects altogether,
 to
  save the multiple steps Colin referred to.
 
  If this option is feasible, should we consider increasing the jenkins
  timeout for this kind of changes (I mean making the timeout adjustable,
 if
  it's for single sub-project, use the old timeout; otherwise, increase
  accordingly)  so that we have at least this option when needed?
 
  Thanks.
 
  --Yongjun
 
 
  On Tue, Nov 25, 2014 at 2:28 AM, Steve Loughran ste...@hortonworks.com
 
  wrote:
 
   On 25 November 2014 at 00:58, Bernd Eckenfels e...@zusammenkunft.net
 
   wrote:
  
Hello,
   
Am Mon, 24 Nov 2014 16:16:00 -0800
schrieb Colin McCabe cmcc...@alumni.cmu.edu:
   
 Conceptually, I think it's important to support patches that
 modify
 multiple sub-projects.  Otherwise refactoring things in common
 becomes a multi-step process.
   
This might be rather philosophical (and I dont want to argue the
 need
to have the patch infrastructure work for the multi-project case),
howevere if a multi-project change cannot be applied in multiple
 steps
it is probably also not safe at runtime (unless the multiple
 projects
belong to a single instance/artifact). And then beeing forced to
commit/compile/test in multiple steps actually increases the
dependencies topology.
   
  
   +1 for changes that span, say hadoop and hbase. but not for changes
  across
   hadoop-common and hdfs, or hadoop-common and yarn. changes that only
 span
   hdfs-and-yarn would be fairly doubtful too.
  
   there is a dependency graph in hadoop's own jars —and cross module
 (not
   cross project) changes do need to happen.
  
   --
   CONFIDENTIALITY NOTICE
   NOTICE: This message is intended for the use of the individual or
 entity
  to
   which it is addressed and may contain information that is
 confidential,
   privileged and exempt from disclosure under applicable law. If the
 reader
   of this message is not the intended recipient, you are hereby notified
  that
   any printing, copying, dissemination, distribution, disclosure or
   forwarding of this communication is strictly prohibited. If you have
   received this communication in error, please contact the sender
  immediately
   and delete it from your system. Thank You.
  
 





Re: Solaris Port

2014-12-11 Thread malcolm

Hi Asok,

I googled and found that windows has strerror, and strerror_s (which is 
the strerror_r equivalent).

Is there a reason why you didn't use this call ?

On 12/11/2014 06:27 PM, Asokan, M wrote:

Hi Malcom,
Recently, I had to work on a function to get system error message on 
various systems.  Here is the piece of code I came up with.  Hope it helps.

static void get_system_error_message(char * buf, int buf_len, int code)
{
#if defined(_WIN32)
 LPVOID lpMsgBuf;
 DWORD status = FormatMessage(FORMAT_MESSAGE_ALLOCATE_BUFFER |
  FORMAT_MESSAGE_FROM_SYSTEM |
  FORMAT_MESSAGE_IGNORE_INSERTS,
  NULL, code,
  MAKELANGID(LANG_NEUTRAL, SUBLANG_DEFAULT),
  /* Default language */
  (LPTSTR) lpMsgBuf, 0, NULL);
 if (status  0)
 {
 strncpy(buf, (char *)lpMsgBuf, buf_len-1);
 buf[buf_len-1] = '\0';
 /* Free the buffer returned by system */
 LocalFree(lpMsgBuf);
 }
 else
 {
 _snprintf(buf, buf_len-1 , %s %d,
 Can't get system error message for code, code);
 buf[buf_len-1] = '\0';
 }
#else
#if defined(_HPUX_SOURCE)
 {
 char * msg;
 errno = 0;
 msg = strerror(code);
 if (errno == 0)
 {
 strncpy(buf, msg, buf_len-1);
 buf[buf_len-1] = '\0';
 }
 else
 {
 snprintf(buf, buf_len, %s %d,
 Can't get system error message for code, code);
 }
 }
#else
 if (strerror_r(code, buf, buf_len) != 0)
 {
 snprintf(buf, buf_len, %s %d,
 Can't get system error message for code, code);
 }
#endif
#endif
}

Note that HPUX does not have strerror_r() since strerror() itself is 
thread-safe.  Also Windows does not have snprintf().  The equivalent function 
_snprintf() has a subtle difference in its interface.

-- Asokan

From: malcolm [malcolm.kaval...@oracle.com]
Sent: Thursday, December 11, 2014 11:02 AM
To: common-dev@hadoop.apache.org
Subject: Re: Solaris Port

Fine with me, I volunteer to do this, if accepted.

On 12/11/2014 05:48 PM, Allen Wittenauer wrote:

sys_errlist was removed for a reason.  Creating a fake sys_errlist on Solaris 
will mean the libhadoop.so will need to be tied a specific build 
(kernel/include pairing) and therefore limits upward mobility/compatibility.  
That doesn’t seem like a very good idea.

IMO, switching to strerror_r is much preferred, since other than the brain-dead 
GNU libc version, is highly portable and should work regardless of the kernel 
or OS in place.

On Dec 11, 2014, at 5:20 AM, malcolm malcolm.kaval...@oracle.com wrote:


FYI, there are a couple more files that reference sys_errlist directly (not 
just terror within exception.c) , but also hdfs_http_client.c and NativeiO.c

On 12/11/2014 07:38 AM, malcolm wrote:

Hi Colin,

Exactly, as you noticed, the problem is the thread-local buffer needed to 
return from terror.
Currently, terror just returns a static string from an array, this is fast, 
simple and error-proof.

In order to use strerror_r inside terror,  would require allocating a buffer 
inside terror  and depend on the caller to free the buffer after using it, or 
to pass a buffer to terrror (which is basically the same as strerror_r, 
rendering terror redundant).
Both cases require modification outside terror itself, as far as I can tell, no 
simple fix. Unless you have an alternative which I haven't thought of ?

As far as I can tell, we have two choices:

1. Remove terror and replace calls with strerror_r, passing a buffer from the 
callee.
 Advantage: a more modern portable interface.
 Disadvantage: All calls to terror need to be modified, though all seem to 
be in a few files as far as I can tell.

2. Adding a sys_errlist array (ifdeffed for Solaris)
 Advantage: no change to any calls to terror
 Disadvantage: 2 additional files added to source tree (.c and .h) and some 
minor ifdefs only used for Solaris.

I think it is more a question of style than anything else, so I leave you to 
make the call.

Thanks for your patience,
Malcolm





On 12/10/2014 09:54 PM, Colin McCabe wrote:

On Wed, Dec 10, 2014 at 2:31 AM, malcolm malcolm.kaval...@oracle.com wrote:

Hi Colin,

Thanks for the hints around JIRAs.

You are correct errno still exists, however sys_errlist does not.

Hadoop uses a function terror (defined in exception.c) which indexes
sys_errlist by errno to return the error message from the array. This
function is called 26 times in various places (in 2.2)

Originally, I thought to replace all calls to terror with strerror, but
there can be issues with multi-threading (it returns a buffer which can be
overwritten), so it seemed simpler just to recreate the 

[jira] [Created] (HADOOP-11397) Can't override HADOOP_IDENT_STRING

2014-12-11 Thread Allen Wittenauer (JIRA)
Allen Wittenauer created HADOOP-11397:
-

 Summary: Can't override HADOOP_IDENT_STRING
 Key: HADOOP-11397
 URL: https://issues.apache.org/jira/browse/HADOOP-11397
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Allen Wittenauer
Priority: Trivial


Simple typo in hadoop_basic_init:

  HADOOP_IDENT_STRING=${HADOP_IDENT_STRING:-$USER}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


RE: Solaris Port

2014-12-11 Thread Asokan, M
Hi Malcom,
  The Windows versions of strerror() and strerror_s() functions are probably 
meant for ANSI C library functions that set errno.  For core Windows API calls 
(like UNIX system calls), one gets the error number by calling GetLastError() 
function.  In the code snippet I sent earlier, the code argument is the value 
returned by GetLastError().  Neither strerror() nor strerror_s() will give the 
correct error message for this error code.

You could probably look at libwinutils.c in Hadoop source.  It uses 
FormatMessageW (which returns messages in Unicode.)  My requirement was to 
return messages in current system locale.

-- Asokan

From: malcolm [malcolm.kaval...@oracle.com]
Sent: Thursday, December 11, 2014 4:04 PM
To: common-dev@hadoop.apache.org
Subject: Re: Solaris Port

Hi Asok,

I googled and found that windows has strerror, and strerror_s (which is
the strerror_r equivalent).
Is there a reason why you didn't use this call ?

On 12/11/2014 06:27 PM, Asokan, M wrote:
 Hi Malcom,
 Recently, I had to work on a function to get system error message on 
 various systems.  Here is the piece of code I came up with.  Hope it helps.

 static void get_system_error_message(char * buf, int buf_len, int code)
 {
 #if defined(_WIN32)
  LPVOID lpMsgBuf;
  DWORD status = FormatMessage(FORMAT_MESSAGE_ALLOCATE_BUFFER |
   FORMAT_MESSAGE_FROM_SYSTEM |
   FORMAT_MESSAGE_IGNORE_INSERTS,
   NULL, code,
   MAKELANGID(LANG_NEUTRAL, SUBLANG_DEFAULT),
   /* Default language 
 */
   (LPTSTR) lpMsgBuf, 0, NULL);
  if (status  0)
  {
  strncpy(buf, (char *)lpMsgBuf, buf_len-1);
  buf[buf_len-1] = '\0';
  /* Free the buffer returned by system */
  LocalFree(lpMsgBuf);
  }
  else
  {
  _snprintf(buf, buf_len-1 , %s %d,
  Can't get system error message for code, code);
  buf[buf_len-1] = '\0';
  }
 #else
 #if defined(_HPUX_SOURCE)
  {
  char * msg;
  errno = 0;
  msg = strerror(code);
  if (errno == 0)
  {
  strncpy(buf, msg, buf_len-1);
  buf[buf_len-1] = '\0';
  }
  else
  {
  snprintf(buf, buf_len, %s %d,
  Can't get system error message for code, code);
  }
  }
 #else
  if (strerror_r(code, buf, buf_len) != 0)
  {
  snprintf(buf, buf_len, %s %d,
  Can't get system error message for code, code);
  }
 #endif
 #endif
 }

 Note that HPUX does not have strerror_r() since strerror() itself is 
 thread-safe.  Also Windows does not have snprintf().  The equivalent function 
 _snprintf() has a subtle difference in its interface.

 -- Asokan
 
 From: malcolm [malcolm.kaval...@oracle.com]
 Sent: Thursday, December 11, 2014 11:02 AM
 To: common-dev@hadoop.apache.org
 Subject: Re: Solaris Port

 Fine with me, I volunteer to do this, if accepted.

 On 12/11/2014 05:48 PM, Allen Wittenauer wrote:
 sys_errlist was removed for a reason.  Creating a fake sys_errlist on 
 Solaris will mean the libhadoop.so will need to be tied a specific build 
 (kernel/include pairing) and therefore limits upward mobility/compatibility. 
  That doesn’t seem like a very good idea.

 IMO, switching to strerror_r is much preferred, since other than the 
 brain-dead GNU libc version, is highly portable and should work regardless 
 of the kernel or OS in place.

 On Dec 11, 2014, at 5:20 AM, malcolm malcolm.kaval...@oracle.com wrote:

 FYI, there are a couple more files that reference sys_errlist directly (not 
 just terror within exception.c) , but also hdfs_http_client.c and NativeiO.c

 On 12/11/2014 07:38 AM, malcolm wrote:
 Hi Colin,

 Exactly, as you noticed, the problem is the thread-local buffer needed to 
 return from terror.
 Currently, terror just returns a static string from an array, this is 
 fast, simple and error-proof.

 In order to use strerror_r inside terror,  would require allocating a 
 buffer inside terror  and depend on the caller to free the buffer after 
 using it, or to pass a buffer to terrror (which is basically the same as 
 strerror_r, rendering terror redundant).
 Both cases require modification outside terror itself, as far as I can 
 tell, no simple fix. Unless you have an alternative which I haven't 
 thought of ?

 As far as I can tell, we have two choices:

 1. Remove terror and replace calls with strerror_r, passing a buffer from 
 the callee.
  Advantage: a more modern portable interface.
  Disadvantage: All calls to terror need to be modified, though all 
 seem to be in a few files as far as I can tell.

 2. Adding a sys_errlist array (ifdeffed for Solaris)
  

[jira] [Created] (HADOOP-11398) RetryUpToMaximumTimeWithFixedSleep needs to behave more accurately

2014-12-11 Thread Li Lu (JIRA)
Li Lu created HADOOP-11398:
--

 Summary: RetryUpToMaximumTimeWithFixedSleep needs to behave more 
accurately
 Key: HADOOP-11398
 URL: https://issues.apache.org/jira/browse/HADOOP-11398
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Li Lu
Assignee: Li Lu


RetryUpToMaximumTimeWithFixedSleep now inherits 
RetryUpToMaximumCountWithFixedSleep and just acts as a wrapper to decide 
maxRetries. The current implementation uses (maxTime / sleepTime) as the number 
of maxRetries. This is fine if the actual for each retry is significantly less 
than the sleep time, but it becomes less accurate if each retry takes 
comparable amount of time as the sleep time. The problem gets worse when there 
are underlying retries. 

We may want to use timers inside RetryUpToMaximumTimeWithFixedSleep to perform 
accurate timing. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: submitting a hadoop patch doesn't trigger jenkins test run

2014-12-11 Thread Yongjun Zhang
Many thanks to Ted Yu, Steve Loughran and Andrew Wang for replying in the
jira and Steve/Andrew for making the related changes!

--Yongjun

On Thu, Dec 11, 2014 at 12:41 PM, Yongjun Zhang yzh...@cloudera.com wrote:

 Hi,

 I wonder if anyone can help on resolving HADOOP-11320
 https://issues.apache.org/jira/browse/HADOOP-11320 to increase timeout
 for jenkins test of crossing-subproject patches?

 Thanks a lot,

 --Yongjun

 On Tue, Dec 2, 2014 at 10:10 AM, Yongjun Zhang yzh...@cloudera.com
 wrote:

 Hi,

 Thank you all for the input.

 https://issues.apache.org/jira/browse/HADOOP-11320

 was created for this issue. Welcome to give your further comments there.

 Best,

 --Yongjun

 On Tue, Nov 25, 2014 at 10:26 PM, Colin McCabe cmcc...@alumni.cmu.edu
 wrote:

 +1 for increasing the test timeout for tests spanning multiple
 sub-projects.

 I can see the value in what Steve L. suggested... if you make a major
 change that touches a particular subproject, you should try to get the
 approval of a committer who knows that subproject.  But I don't think
 that
 forcing artificial patch splits is the way to do this...  There are also
 some patches that are completely mechanical and don't really require the
 involvement of YARN / HDFS committer, even if they change that project.
 For example, fixing a misspelling in the name of a hadoop-common API.

 Colin

 On Tue, Nov 25, 2014 at 8:45 AM, Yongjun Zhang yzh...@cloudera.com
 wrote:

  Thanks all for the feedback. To summarize (and I have a suggestion at
 the
  end of this email), there are two scenarios:
 
 1. A change that span multiple *bigger* projects. r.g. hadoop,
 hbase.
 2. A change that span multiple *sub* projects* within hadoop, e.g.,
 common, hdfs, yarn
 
  For 1, it's required for the change to be backward compatible, thus
  splitting change for multiple *bigger* projects is a must.
 
  For 2, there are two sub types,
 
 - 2.1 those changes that can be made within hadoop sub-projects, and
 there is no external impact
 - 2.2 those changes that have external impact, that is, the changes
 involve adding new APIs and marking old API deprecated, and
  corresponding
 changes in other *bigger* projects will have to be made
 independently.
  *But
 the changes within hadoop subjects can still be done altogether.*
 
  I think (Please correct me if I'm wrong):
 
 - What Colin referred to is 2.1 and changes within hadoop
 sub-subjects
 for 2.2;
 - Steve's not for changes across hadoop-common and hdfs, or
 hadoop-common and yarn means 2.1, Steve's  changes that only
 span hdfs-and-yarn would be fairly doubtful too. implies his doubt
 of
 existence of 2.1.
 
  For changes of 2.1 (if any) and *hadoop* changes of 2.2, we do have an
  option of making the change across all hadoop sub-projects altogether,
 to
  save the multiple steps Colin referred to.
 
  If this option is feasible, should we consider increasing the jenkins
  timeout for this kind of changes (I mean making the timeout
 adjustable, if
  it's for single sub-project, use the old timeout; otherwise, increase
  accordingly)  so that we have at least this option when needed?
 
  Thanks.
 
  --Yongjun
 
 
  On Tue, Nov 25, 2014 at 2:28 AM, Steve Loughran 
 ste...@hortonworks.com
  wrote:
 
   On 25 November 2014 at 00:58, Bernd Eckenfels 
 e...@zusammenkunft.net
   wrote:
  
Hello,
   
Am Mon, 24 Nov 2014 16:16:00 -0800
schrieb Colin McCabe cmcc...@alumni.cmu.edu:
   
 Conceptually, I think it's important to support patches that
 modify
 multiple sub-projects.  Otherwise refactoring things in common
 becomes a multi-step process.
   
This might be rather philosophical (and I dont want to argue the
 need
to have the patch infrastructure work for the multi-project case),
howevere if a multi-project change cannot be applied in multiple
 steps
it is probably also not safe at runtime (unless the multiple
 projects
belong to a single instance/artifact). And then beeing forced to
commit/compile/test in multiple steps actually increases the
dependencies topology.
   
  
   +1 for changes that span, say hadoop and hbase. but not for changes
  across
   hadoop-common and hdfs, or hadoop-common and yarn. changes that only
 span
   hdfs-and-yarn would be fairly doubtful too.
  
   there is a dependency graph in hadoop's own jars —and cross module
 (not
   cross project) changes do need to happen.
  
   --
   CONFIDENTIALITY NOTICE
   NOTICE: This message is intended for the use of the individual or
 entity
  to
   which it is addressed and may contain information that is
 confidential,
   privileged and exempt from disclosure under applicable law. If the
 reader
   of this message is not the intended recipient, you are hereby
 notified
  that
   any printing, copying, dissemination, distribution, disclosure or
   forwarding of this communication is strictly prohibited. If you have
   received this 

Re: submitting a hadoop patch doesn't trigger jenkins test run

2014-12-11 Thread Yongjun Zhang
Sorry my bad, I named Andrew Wang for Andrew Bayer in my last mail, both of
them helped anyways:-) So thanks to all for the help on this matter!

--Yongjun

On Thu, Dec 11, 2014 at 3:38 PM, Yongjun Zhang yzh...@cloudera.com wrote:

 Many thanks to Ted Yu, Steve Loughran and Andrew Wang for replying in the
 jira and Steve/Andrew for making the related changes!

 --Yongjun

 On Thu, Dec 11, 2014 at 12:41 PM, Yongjun Zhang yzh...@cloudera.com
 wrote:

 Hi,

 I wonder if anyone can help on resolving HADOOP-11320
 https://issues.apache.org/jira/browse/HADOOP-11320 to increase timeout
 for jenkins test of crossing-subproject patches?

 Thanks a lot,

 --Yongjun

 On Tue, Dec 2, 2014 at 10:10 AM, Yongjun Zhang yzh...@cloudera.com
 wrote:

 Hi,

 Thank you all for the input.

 https://issues.apache.org/jira/browse/HADOOP-11320

 was created for this issue. Welcome to give your further comments there.

 Best,

 --Yongjun

 On Tue, Nov 25, 2014 at 10:26 PM, Colin McCabe cmcc...@alumni.cmu.edu
 wrote:

 +1 for increasing the test timeout for tests spanning multiple
 sub-projects.

 I can see the value in what Steve L. suggested... if you make a major
 change that touches a particular subproject, you should try to get the
 approval of a committer who knows that subproject.  But I don't think
 that
 forcing artificial patch splits is the way to do this...  There are also
 some patches that are completely mechanical and don't really require the
 involvement of YARN / HDFS committer, even if they change that project.
 For example, fixing a misspelling in the name of a hadoop-common API.

 Colin

 On Tue, Nov 25, 2014 at 8:45 AM, Yongjun Zhang yzh...@cloudera.com
 wrote:

  Thanks all for the feedback. To summarize (and I have a suggestion at
 the
  end of this email), there are two scenarios:
 
 1. A change that span multiple *bigger* projects. r.g. hadoop,
 hbase.
 2. A change that span multiple *sub* projects* within hadoop, e.g.,
 common, hdfs, yarn
 
  For 1, it's required for the change to be backward compatible, thus
  splitting change for multiple *bigger* projects is a must.
 
  For 2, there are two sub types,
 
 - 2.1 those changes that can be made within hadoop sub-projects,
 and
 there is no external impact
 - 2.2 those changes that have external impact, that is, the changes
 involve adding new APIs and marking old API deprecated, and
  corresponding
 changes in other *bigger* projects will have to be made
 independently.
  *But
 the changes within hadoop subjects can still be done altogether.*
 
  I think (Please correct me if I'm wrong):
 
 - What Colin referred to is 2.1 and changes within hadoop
 sub-subjects
 for 2.2;
 - Steve's not for changes across hadoop-common and hdfs, or
 hadoop-common and yarn means 2.1, Steve's  changes that only
 span hdfs-and-yarn would be fairly doubtful too. implies his
 doubt of
 existence of 2.1.
 
  For changes of 2.1 (if any) and *hadoop* changes of 2.2, we do have an
  option of making the change across all hadoop sub-projects
 altogether, to
  save the multiple steps Colin referred to.
 
  If this option is feasible, should we consider increasing the jenkins
  timeout for this kind of changes (I mean making the timeout
 adjustable, if
  it's for single sub-project, use the old timeout; otherwise, increase
  accordingly)  so that we have at least this option when needed?
 
  Thanks.
 
  --Yongjun
 
 
  On Tue, Nov 25, 2014 at 2:28 AM, Steve Loughran 
 ste...@hortonworks.com
  wrote:
 
   On 25 November 2014 at 00:58, Bernd Eckenfels 
 e...@zusammenkunft.net
   wrote:
  
Hello,
   
Am Mon, 24 Nov 2014 16:16:00 -0800
schrieb Colin McCabe cmcc...@alumni.cmu.edu:
   
 Conceptually, I think it's important to support patches that
 modify
 multiple sub-projects.  Otherwise refactoring things in common
 becomes a multi-step process.
   
This might be rather philosophical (and I dont want to argue the
 need
to have the patch infrastructure work for the multi-project case),
howevere if a multi-project change cannot be applied in multiple
 steps
it is probably also not safe at runtime (unless the multiple
 projects
belong to a single instance/artifact). And then beeing forced to
commit/compile/test in multiple steps actually increases the
dependencies topology.
   
  
   +1 for changes that span, say hadoop and hbase. but not for changes
  across
   hadoop-common and hdfs, or hadoop-common and yarn. changes that
 only span
   hdfs-and-yarn would be fairly doubtful too.
  
   there is a dependency graph in hadoop's own jars —and cross module
 (not
   cross project) changes do need to happen.
  
   --
   CONFIDENTIALITY NOTICE
   NOTICE: This message is intended for the use of the individual or
 entity
  to
   which it is addressed and may contain information that is
 confidential,
   privileged and exempt from disclosure under applicable law. If the
 reader
   of this 

[jira] [Resolved] (HADOOP-11389) Clean up byte to string encoding issues in hadoop-common

2014-12-11 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai resolved HADOOP-11389.
-
Resolution: Fixed

I've committed the patch to trunk and branch-2.

 Clean up byte to string encoding issues in hadoop-common
 

 Key: HADOOP-11389
 URL: https://issues.apache.org/jira/browse/HADOOP-11389
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HADOOP-11389.000.patch, HADOOP-11389.001.patch


 Much code in hadoop-common convert bytes to string using default charsets. 
 The behavior of conversion depends on the platform settings of encoding, 
 which is flagged by newer versions of findbugs. This jira proposes to fix the 
 findbugs warnings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Solaris Port

2014-12-11 Thread malcolm
So, turns out that if I had naively changed all calls to terror or 
references to sys_errlist, to using strerror_r, then I would have broken 
code for Windows and HPUX (and possibly other OSes).


If we are to assume that current code runs fine on all platforms (maybe 
even AIX an MacOS, for example), then any change/additions made to the 
code and not ifdeffed appropriately can break on other OSes. On the 
other hand,  too many ifdefs can pollute the code source and render it 
less readable (though possibly less important).


In the general case what are code contributors responsibilities to 
adding code regarding OSes besides Linux ?

What OSes does jenkins test on ?
I guess maintainers of code on non-tested platforms are responsible for 
their own testing ?


How do we avoid the ping-pong effect, i.e. I make a generic change to 
code which breaks on Windows, then the Windows maintainer reverts 
changes to break on Solaris for example ? Or does this not happen in 
actuality ?


On 12/11/2014 11:25 PM, Asokan, M wrote:

Hi Malcom,
   The Windows versions of strerror() and strerror_s() functions are probably meant for 
ANSI C library functions that set errno.  For core Windows API calls (like UNIX system 
calls), one gets the error number by calling GetLastError() function.  In the code 
snippet I sent earlier, the code argument is the value returned by 
GetLastError().  Neither strerror() nor strerror_s() will give the correct error message 
for this error code.

You could probably look at libwinutils.c in Hadoop source.  It uses 
FormatMessageW (which returns messages in Unicode.)  My requirement was to 
return messages in current system locale.

-- Asokan

From: malcolm [malcolm.kaval...@oracle.com]
Sent: Thursday, December 11, 2014 4:04 PM
To: common-dev@hadoop.apache.org
Subject: Re: Solaris Port

Hi Asok,

I googled and found that windows has strerror, and strerror_s (which is
the strerror_r equivalent).
Is there a reason why you didn't use this call ?

On 12/11/2014 06:27 PM, Asokan, M wrote:

Hi Malcom,
 Recently, I had to work on a function to get system error message on 
various systems.  Here is the piece of code I came up with.  Hope it helps.

static void get_system_error_message(char * buf, int buf_len, int code)
{
#if defined(_WIN32)
  LPVOID lpMsgBuf;
  DWORD status = FormatMessage(FORMAT_MESSAGE_ALLOCATE_BUFFER |
   FORMAT_MESSAGE_FROM_SYSTEM |
   FORMAT_MESSAGE_IGNORE_INSERTS,
   NULL, code,
   MAKELANGID(LANG_NEUTRAL, SUBLANG_DEFAULT),
   /* Default language 
*/
   (LPTSTR) lpMsgBuf, 0, NULL);
  if (status  0)
  {
  strncpy(buf, (char *)lpMsgBuf, buf_len-1);
  buf[buf_len-1] = '\0';
  /* Free the buffer returned by system */
  LocalFree(lpMsgBuf);
  }
  else
  {
  _snprintf(buf, buf_len-1 , %s %d,
  Can't get system error message for code, code);
  buf[buf_len-1] = '\0';
  }
#else
#if defined(_HPUX_SOURCE)
  {
  char * msg;
  errno = 0;
  msg = strerror(code);
  if (errno == 0)
  {
  strncpy(buf, msg, buf_len-1);
  buf[buf_len-1] = '\0';
  }
  else
  {
  snprintf(buf, buf_len, %s %d,
  Can't get system error message for code, code);
  }
  }
#else
  if (strerror_r(code, buf, buf_len) != 0)
  {
  snprintf(buf, buf_len, %s %d,
  Can't get system error message for code, code);
  }
#endif
#endif
}

Note that HPUX does not have strerror_r() since strerror() itself is 
thread-safe.  Also Windows does not have snprintf().  The equivalent function 
_snprintf() has a subtle difference in its interface.

-- Asokan

From: malcolm [malcolm.kaval...@oracle.com]
Sent: Thursday, December 11, 2014 11:02 AM
To: common-dev@hadoop.apache.org
Subject: Re: Solaris Port

Fine with me, I volunteer to do this, if accepted.

On 12/11/2014 05:48 PM, Allen Wittenauer wrote:

sys_errlist was removed for a reason.  Creating a fake sys_errlist on Solaris 
will mean the libhadoop.so will need to be tied a specific build 
(kernel/include pairing) and therefore limits upward mobility/compatibility.  
That doesn’t seem like a very good idea.

IMO, switching to strerror_r is much preferred, since other than the brain-dead 
GNU libc version, is highly portable and should work regardless of the kernel 
or OS in place.

On Dec 11, 2014, at 5:20 AM, malcolm malcolm.kaval...@oracle.com wrote:


FYI, there are a couple more files that reference sys_errlist directly (not 
just terror within exception.c) , but also hdfs_http_client.c and NativeiO.c

On 12/11/2014 07:38 AM, malcolm