Re: Solaris Port SOLVED!
On 16 December 2014 at 16:01, malcolm wrote: > 1. Findbugs , 3 warnings in Java code (which of course I did not touch) > 2. Test failures also with no connection to terror: A java socket timeout, > ongoing issues with (1) transition to java 7 builds and (2) some intermittent tests that need to get fixed. ignore them -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Solaris Port SOLVED!
On 12/16/2014 11:01 AM, malcolm wrote: This is weird, Jenkins complains about: 1. Findbugs , 3 warnings in Java code (which of course I did not touch) The FB warnings seem to be a recent phenomenon. I have seen them on a recent test run of my own and they come and go depending on the run. I think they can be safely ignored. However, if you want to be sure, then you could do the findbugs run on your local machine both with and without your patch applied and compare the results. If you find that there's no difference, then just put a comment in the Jira stating that. 2. Test failures also with no connection to terror: A java socket timeout, Yes, probably unrelated. To be sure, run those same tests on your local machine and if they pass, then put a comment in the Jira saying that they run on your local machine. If they fail, then run them with and without the patch to make sure they fail both ways. Charles
Re: Solaris Port SOLVED!
This is weird, Jenkins complains about: 1. Findbugs , 3 warnings in Java code (which of course I did not touch) 2. Test failures also with no connection to terror: A java socket timeout, As a newbie, I am not quite sure how to relate to this. (I could just revert the code back, and see if I get the same errors anyway.) On 12/16/2014 06:57 AM, malcolm wrote: Done, and added the comment as you requested. I attached a second patch file to the JIRA (with .002 appended as per convention) assuming Jenkins knows to take the latest version, since I understand that I cannot remove the previous patch file . On 12/16/2014 04:12 AM, Colin McCabe wrote: Thanks, Malcom. I reviewed it. The only thing you still have to do is hit "submit patch" to get a Jenkins run. See our HowToContribute wiki page for more details. wiki.apache.org/hadoop/HowToContribute best, Colin On Sat, Dec 13, 2014 at 9:22 PM, malcolm wrote: I am checking on the latest release of Solaris 11 and yes, it is still thread safe (or MT Safe as documented on the man page). strerror checks the error code, and returns the same "unknown error" string as terror does, if it receives an invalid code. I checked this on Windows, Solaris and Linux (though my changes only affect Solaris platforms). JIRA newbie question: I have filed the JIRA attaching the patch HADOOP-11403 against the trunk, asking for reviewers in the comments section. Is there any other protocol I should follow ? Thanks, Malcolm On 12/14/2014 01:08 AM, Asokan, M wrote: Malcom, That's great! Is strerror() thread-safe in the recent version of Solaris? In any case, to be correct you still need to make sure that the code passed to strerror() is a valid one. For this you need to check errno after the call to strerror(). Please check the code snippet I sent earlier for HPUX. -- Asokan From: malcolm [malcolm.kaval...@oracle.com] Sent: Saturday, December 13, 2014 3:13 PM To: common-dev@hadoop.apache.org Subject: Re: Solaris Port SOLVED! Wiping egg off face ... After consulting with the Solaris team (and looking at the source code and man page) , it turns out that strerror itself on Solaris is MT-Safe ! (Just like HPUX) So, after all this effort, all I need to do is modify terror as follows: const char* terror(int errnum) { #if defined(__sun) return strerror(errnum); // MT-Safe under Solaris #else if ((errnum < 0) || (errnum >= sys_nerr)) { return "unknown error."; } return sys_errlist[errnum]; #endif } And in two other files where sys_errlist is referenced directly (NativeIO and hdfs_http_client.c), I replaced this direct access instead with a call to terror. Thanks for all your help and patience, I'll file a JIRA asap, Cheers, Malcolm On 12/13/2014 05:26 PM, malcolm wrote: Thanks Asokan, Looked up Gcc's thread local variables, seems a bit complex though and quite specific to Gnu. Intialization of the static errlist array should be thread safe i.e. initially the array is nulled out, and afterwards if two threads write to the same address, then they would be writing the same string. But if we are ok with changing 5 files, not just terror, then I would just remove terror completely and use strerror_r (or the alternatives for Windows and HP_UX) in the caller code instead i.e. using your suggestion for a local buffer in the caller, wherever needed. The more I think about it, the more this seems to be the right thing to do. Cheers, Malcolm On 12/13/2014 04:38 PM, Asokan, M wrote: Malcom, Gcc supports thread-local variables. See https://gcc.gnu.org/onlinedocs/gcc-3.3.1/gcc/Thread-Local.html I am not sure about native compilers on Solaris, HPUX, or AIX. In any case, I found out that the Windows native code in Hadoop seems to handle error messages properly. Here is what I found: $ find ~/work/hadoop/hadoop-trunk/ -name '*.c'|xargs grephadoop how to file a jira FormatMessage|awk -F: '{print $1}'|sort -u /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/nativeio/NativeIO.c /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsMappingWin.c /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/winutils/libwinutils.c $ find ~/work/hadoop/hadoop-trunk/ -name '*.c'|xargs grep terror|awk -F: '{print $1}'|sort -u /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/exception.c /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/nativeio/SharedFileDescriptorFactory.c /home/asokan/work/hadoop/hadoop-trunk/hadoop
Re: Solaris Port SOLVED!
Done, and added the comment as you requested. I attached a second patch file to the JIRA (with .002 appended as per convention) assuming Jenkins knows to take the latest version, since I understand that I cannot remove the previous patch file . On 12/16/2014 04:12 AM, Colin McCabe wrote: Thanks, Malcom. I reviewed it. The only thing you still have to do is hit "submit patch" to get a Jenkins run. See our HowToContribute wiki page for more details. wiki.apache.org/hadoop/HowToContribute best, Colin On Sat, Dec 13, 2014 at 9:22 PM, malcolm wrote: I am checking on the latest release of Solaris 11 and yes, it is still thread safe (or MT Safe as documented on the man page). strerror checks the error code, and returns the same "unknown error" string as terror does, if it receives an invalid code. I checked this on Windows, Solaris and Linux (though my changes only affect Solaris platforms). JIRA newbie question: I have filed the JIRA attaching the patch HADOOP-11403 against the trunk, asking for reviewers in the comments section. Is there any other protocol I should follow ? Thanks, Malcolm On 12/14/2014 01:08 AM, Asokan, M wrote: Malcom, That's great! Is strerror() thread-safe in the recent version of Solaris? In any case, to be correct you still need to make sure that the code passed to strerror() is a valid one. For this you need to check errno after the call to strerror(). Please check the code snippet I sent earlier for HPUX. -- Asokan From: malcolm [malcolm.kaval...@oracle.com] Sent: Saturday, December 13, 2014 3:13 PM To: common-dev@hadoop.apache.org Subject: Re: Solaris Port SOLVED! Wiping egg off face ... After consulting with the Solaris team (and looking at the source code and man page) , it turns out that strerror itself on Solaris is MT-Safe ! (Just like HPUX) So, after all this effort, all I need to do is modify terror as follows: const char* terror(int errnum) { #if defined(__sun) return strerror(errnum); // MT-Safe under Solaris #else if ((errnum < 0) || (errnum >= sys_nerr)) { return "unknown error."; } return sys_errlist[errnum]; #endif } And in two other files where sys_errlist is referenced directly (NativeIO and hdfs_http_client.c), I replaced this direct access instead with a call to terror. Thanks for all your help and patience, I'll file a JIRA asap, Cheers, Malcolm On 12/13/2014 05:26 PM, malcolm wrote: Thanks Asokan, Looked up Gcc's thread local variables, seems a bit complex though and quite specific to Gnu. Intialization of the static errlist array should be thread safe i.e. initially the array is nulled out, and afterwards if two threads write to the same address, then they would be writing the same string. But if we are ok with changing 5 files, not just terror, then I would just remove terror completely and use strerror_r (or the alternatives for Windows and HP_UX) in the caller code instead i.e. using your suggestion for a local buffer in the caller, wherever needed. The more I think about it, the more this seems to be the right thing to do. Cheers, Malcolm On 12/13/2014 04:38 PM, Asokan, M wrote: Malcom, Gcc supports thread-local variables. See https://gcc.gnu.org/onlinedocs/gcc-3.3.1/gcc/Thread-Local.html I am not sure about native compilers on Solaris, HPUX, or AIX. In any case, I found out that the Windows native code in Hadoop seems to handle error messages properly. Here is what I found: $ find ~/work/hadoop/hadoop-trunk/ -name '*.c'|xargs grephadoop how to file a jira FormatMessage|awk -F: '{print $1}'|sort -u /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/nativeio/NativeIO.c /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsMappingWin.c /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/winutils/libwinutils.c $ find ~/work/hadoop/hadoop-trunk/ -name '*.c'|xargs grep terror|awk -F: '{print $1}'|sort -u /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/exception.c /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/nativeio/SharedFileDescriptorFactory.c /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocket.c /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocketWatcher.c /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsMapping.c This means y
Re: Solaris Port SOLVED!
Thanks, Malcom. I reviewed it. The only thing you still have to do is hit "submit patch" to get a Jenkins run. See our HowToContribute wiki page for more details. wiki.apache.org/hadoop/HowToContribute best, Colin On Sat, Dec 13, 2014 at 9:22 PM, malcolm wrote: > I am checking on the latest release of Solaris 11 and yes, it is still > thread safe (or MT Safe as documented on the man page). > > strerror checks the error code, and returns the same "unknown error" string > as terror does, if it receives an invalid code. I checked this on Windows, > Solaris and Linux (though my changes only affect Solaris platforms). > > JIRA newbie question: > > I have filed the JIRA attaching the patch HADOOP-11403 against the trunk, > asking for reviewers in the comments section. > Is there any other protocol I should follow ? > > Thanks, > Malcolm > > > On 12/14/2014 01:08 AM, Asokan, M wrote: >> >> Malcom, >> That's great! Is strerror() thread-safe in the recent version of >> Solaris? In any case, to be correct you still need to make sure that the >> code passed to strerror() is a valid one. For this you need to check errno >> after the call to strerror(). Please check the code snippet I sent earlier >> for HPUX. >> >> -- Asokan >> >> From: malcolm [malcolm.kaval...@oracle.com] >> Sent: Saturday, December 13, 2014 3:13 PM >> To: common-dev@hadoop.apache.org >> Subject: Re: Solaris Port SOLVED! >> >> Wiping egg off face ... >> >> After consulting with the Solaris team (and looking at the source code >> and man page) , it turns out that strerror itself on Solaris is MT-Safe >> ! (Just like HPUX) >> >> So, after all this effort, all I need to do is modify terror as follows: >> >> const char* terror(int errnum) >> { >> >> #if defined(__sun) >> return strerror(errnum); // MT-Safe under Solaris >> #else >> if ((errnum < 0) || (errnum >= sys_nerr)) { >> return "unknown error."; >> } >> return sys_errlist[errnum]; >> #endif >> } >> >> And in two other files where sys_errlist is referenced directly >> (NativeIO and hdfs_http_client.c), I replaced this direct access instead >> with a call to terror. >> >> Thanks for all your help and patience, >> >> I'll file a JIRA asap, >> >> Cheers, >> Malcolm >> >> On 12/13/2014 05:26 PM, malcolm wrote: >>> >>> Thanks Asokan, >>> >>> Looked up Gcc's thread local variables, seems a bit complex though and >>> quite specific to Gnu. >>> >>> Intialization of the static errlist array should be thread safe i.e. >>> initially the array is nulled out, and afterwards if two threads write >>> to the same address, then they would be writing the same string. >>> >>> But if we are ok with changing 5 files, not just terror, then I would >>> just remove terror completely and use strerror_r (or the alternatives >>> for Windows and HP_UX) in the caller code instead i.e. using your >>> suggestion for a local buffer in the caller, wherever needed. The more >>> I think about it, the more this seems to be the right thing to do. >>> >>> Cheers, >>> Malcolm >>> >>> >>> On 12/13/2014 04:38 PM, Asokan, M wrote: >>>> >>>> Malcom, >>>> Gcc supports thread-local variables. See >>>> >>>> https://gcc.gnu.org/onlinedocs/gcc-3.3.1/gcc/Thread-Local.html >>>> >>>> I am not sure about native compilers on Solaris, HPUX, or AIX. >>>> >>>> In any case, I found out that the Windows native code in Hadoop seems >>>> to handle error messages properly. Here is what I found: >>>> >>>> $ find ~/work/hadoop/hadoop-trunk/ -name '*.c'|xargs grephadoop how to >>>> file a jira >>>> >>>> FormatMessage|awk -F: '{print $1}'|sort -u >>>> >>>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/nativeio/NativeIO.c >>>> >>>> >>>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsMappingWin.c >>>> >>>> >>>> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/sr
Re: Solaris Port SOLVED!
I am checking on the latest release of Solaris 11 and yes, it is still thread safe (or MT Safe as documented on the man page). strerror checks the error code, and returns the same "unknown error" string as terror does, if it receives an invalid code. I checked this on Windows, Solaris and Linux (though my changes only affect Solaris platforms). JIRA newbie question: I have filed the JIRA attaching the patch HADOOP-11403 against the trunk, asking for reviewers in the comments section. Is there any other protocol I should follow ? Thanks, Malcolm On 12/14/2014 01:08 AM, Asokan, M wrote: Malcom, That's great! Is strerror() thread-safe in the recent version of Solaris? In any case, to be correct you still need to make sure that the code passed to strerror() is a valid one. For this you need to check errno after the call to strerror(). Please check the code snippet I sent earlier for HPUX. -- Asokan From: malcolm [malcolm.kaval...@oracle.com] Sent: Saturday, December 13, 2014 3:13 PM To: common-dev@hadoop.apache.org Subject: Re: Solaris Port SOLVED! Wiping egg off face ... After consulting with the Solaris team (and looking at the source code and man page) , it turns out that strerror itself on Solaris is MT-Safe ! (Just like HPUX) So, after all this effort, all I need to do is modify terror as follows: const char* terror(int errnum) { #if defined(__sun) return strerror(errnum); // MT-Safe under Solaris #else if ((errnum < 0) || (errnum >= sys_nerr)) { return "unknown error."; } return sys_errlist[errnum]; #endif } And in two other files where sys_errlist is referenced directly (NativeIO and hdfs_http_client.c), I replaced this direct access instead with a call to terror. Thanks for all your help and patience, I'll file a JIRA asap, Cheers, Malcolm On 12/13/2014 05:26 PM, malcolm wrote: Thanks Asokan, Looked up Gcc's thread local variables, seems a bit complex though and quite specific to Gnu. Intialization of the static errlist array should be thread safe i.e. initially the array is nulled out, and afterwards if two threads write to the same address, then they would be writing the same string. But if we are ok with changing 5 files, not just terror, then I would just remove terror completely and use strerror_r (or the alternatives for Windows and HP_UX) in the caller code instead i.e. using your suggestion for a local buffer in the caller, wherever needed. The more I think about it, the more this seems to be the right thing to do. Cheers, Malcolm On 12/13/2014 04:38 PM, Asokan, M wrote: Malcom, Gcc supports thread-local variables. See https://gcc.gnu.org/onlinedocs/gcc-3.3.1/gcc/Thread-Local.html I am not sure about native compilers on Solaris, HPUX, or AIX. In any case, I found out that the Windows native code in Hadoop seems to handle error messages properly. Here is what I found: $ find ~/work/hadoop/hadoop-trunk/ -name '*.c'|xargs grephadoop how to file a jira FormatMessage|awk -F: '{print $1}'|sort -u /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/nativeio/NativeIO.c /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsMappingWin.c /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/winutils/libwinutils.c $ find ~/work/hadoop/hadoop-trunk/ -name '*.c'|xargs grep terror|awk -F: '{print $1}'|sort -u /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/exception.c /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/nativeio/SharedFileDescriptorFactory.c /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocket.c /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocketWatcher.c /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsMapping.c This means you need not worry about the Windows version of terror(). You need to change five files that contain UNIX specific native code. I have a question on your suggested implementation: How do you initialize the static errlist array in a thread-safe manner? Here is another thread-safe implementation that I could come up with: #include #include #include #include #define MESSAGE_BUFFER_SIZE 256 char * getSystemErrorMessage(char * buf, int buf_len, int code) { #if defined(_HPUX_SOURCE) char * msg; errno = 0; msg = strerror(code);
RE: Solaris Port SOLVED!
Malcom, That's great! Is strerror() thread-safe in the recent version of Solaris? In any case, to be correct you still need to make sure that the code passed to strerror() is a valid one. For this you need to check errno after the call to strerror(). Please check the code snippet I sent earlier for HPUX. -- Asokan From: malcolm [malcolm.kaval...@oracle.com] Sent: Saturday, December 13, 2014 3:13 PM To: common-dev@hadoop.apache.org Subject: Re: Solaris Port SOLVED! Wiping egg off face ... After consulting with the Solaris team (and looking at the source code and man page) , it turns out that strerror itself on Solaris is MT-Safe ! (Just like HPUX) So, after all this effort, all I need to do is modify terror as follows: const char* terror(int errnum) { #if defined(__sun) return strerror(errnum); // MT-Safe under Solaris #else if ((errnum < 0) || (errnum >= sys_nerr)) { return "unknown error."; } return sys_errlist[errnum]; #endif } And in two other files where sys_errlist is referenced directly (NativeIO and hdfs_http_client.c), I replaced this direct access instead with a call to terror. Thanks for all your help and patience, I'll file a JIRA asap, Cheers, Malcolm On 12/13/2014 05:26 PM, malcolm wrote: > Thanks Asokan, > > Looked up Gcc's thread local variables, seems a bit complex though and > quite specific to Gnu. > > Intialization of the static errlist array should be thread safe i.e. > initially the array is nulled out, and afterwards if two threads write > to the same address, then they would be writing the same string. > > But if we are ok with changing 5 files, not just terror, then I would > just remove terror completely and use strerror_r (or the alternatives > for Windows and HP_UX) in the caller code instead i.e. using your > suggestion for a local buffer in the caller, wherever needed. The more > I think about it, the more this seems to be the right thing to do. > > Cheers, > Malcolm > > > On 12/13/2014 04:38 PM, Asokan, M wrote: >> Malcom, >> Gcc supports thread-local variables. See >> >> https://gcc.gnu.org/onlinedocs/gcc-3.3.1/gcc/Thread-Local.html >> >> I am not sure about native compilers on Solaris, HPUX, or AIX. >> >> In any case, I found out that the Windows native code in Hadoop seems >> to handle error messages properly. Here is what I found: >> >> $ find ~/work/hadoop/hadoop-trunk/ -name '*.c'|xargs grep >> FormatMessage|awk -F: '{print $1}'|sort -u >> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/nativeio/NativeIO.c >> >> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsMappingWin.c >> >> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/winutils/libwinutils.c >> >> >> >> $ find ~/work/hadoop/hadoop-trunk/ -name '*.c'|xargs grep terror|awk >> -F: '{print $1}'|sort -u >> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/exception.c >> >> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/nativeio/SharedFileDescriptorFactory.c >> >> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocket.c >> >> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocketWatcher.c >> >> /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsMapping.c >> >> >> >> This means you need not worry about the Windows version of terror(). >> You need to change five files that contain UNIX specific native code. >> >> I have a question on your suggested implementation: >> >> How do you initialize the static errlist array in a thread-safe manner? >> >> >> Here is another thread-safe implementation that I could come up with: >> >> #include >> #include >> #include >> #include >> >> #define MESSAGE_BUFFER_SIZE 256 >> >> char * getSystemErrorMessage(char * buf, int buf_len, int code) { >> #if defined(_HPUX_SOURCE) >>char * msg; >>errno = 0; >>msg = strerror(code); >>if (errno == 0) { >> strncpy(buf, msg, buf_len-1); >>
Re: Solaris Port SOLVED!
Wiping egg off face ... After consulting with the Solaris team (and looking at the source code and man page) , it turns out that strerror itself on Solaris is MT-Safe ! (Just like HPUX) So, after all this effort, all I need to do is modify terror as follows: const char* terror(int errnum) { #if defined(__sun) return strerror(errnum); // MT-Safe under Solaris #else if ((errnum < 0) || (errnum >= sys_nerr)) { return "unknown error."; } return sys_errlist[errnum]; #endif } And in two other files where sys_errlist is referenced directly (NativeIO and hdfs_http_client.c), I replaced this direct access instead with a call to terror. Thanks for all your help and patience, I'll file a JIRA asap, Cheers, Malcolm On 12/13/2014 05:26 PM, malcolm wrote: Thanks Asokan, Looked up Gcc's thread local variables, seems a bit complex though and quite specific to Gnu. Intialization of the static errlist array should be thread safe i.e. initially the array is nulled out, and afterwards if two threads write to the same address, then they would be writing the same string. But if we are ok with changing 5 files, not just terror, then I would just remove terror completely and use strerror_r (or the alternatives for Windows and HP_UX) in the caller code instead i.e. using your suggestion for a local buffer in the caller, wherever needed. The more I think about it, the more this seems to be the right thing to do. Cheers, Malcolm On 12/13/2014 04:38 PM, Asokan, M wrote: Malcom, Gcc supports thread-local variables. See https://gcc.gnu.org/onlinedocs/gcc-3.3.1/gcc/Thread-Local.html I am not sure about native compilers on Solaris, HPUX, or AIX. In any case, I found out that the Windows native code in Hadoop seems to handle error messages properly. Here is what I found: $ find ~/work/hadoop/hadoop-trunk/ -name '*.c'|xargs grep FormatMessage|awk -F: '{print $1}'|sort -u /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/nativeio/NativeIO.c /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsMappingWin.c /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/winutils/libwinutils.c $ find ~/work/hadoop/hadoop-trunk/ -name '*.c'|xargs grep terror|awk -F: '{print $1}'|sort -u /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/exception.c /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/nativeio/SharedFileDescriptorFactory.c /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocket.c /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocketWatcher.c /home/asokan/work/hadoop/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsMapping.c This means you need not worry about the Windows version of terror(). You need to change five files that contain UNIX specific native code. I have a question on your suggested implementation: How do you initialize the static errlist array in a thread-safe manner? Here is another thread-safe implementation that I could come up with: #include #include #include #include #define MESSAGE_BUFFER_SIZE 256 char * getSystemErrorMessage(char * buf, int buf_len, int code) { #if defined(_HPUX_SOURCE) char * msg; errno = 0; msg = strerror(code); if (errno == 0) { strncpy(buf, msg, buf_len-1); buf[buf_len-1] = '\0'; } else { snprintf(buf, buf_len, "%s %d", "Can't get system error message for code", code); } #else if (strerror_r(code, buf, buf_len) != 0) { snprintf(buf, buf_len, "%s %d", "Can't get system error message for code", code); } #endif return buf; } #define TERROR(code) \ getSystemErrorMessage(messageBuffer, sizeof(messageBuffer), code) int main(int argc, char ** argv) { if (argc > 1) { char messageBuffer[MESSAGE_BUFFER_SIZE]; int code = atoi(argv[1]); fprintf(stderr, "System error for code %s: %s\n", argv[1], TERROR(code)); } return 0; } This changes terror to a macro TERROR and requires all functions that call TERROR macro to declare the local variable messageBuffer. Since there are only five files to modify, I think it is not a big effort. What do you think? -- Asokan On 12/13/2014 04:29 AM, malcolm wrote: Colin, I am not sure what you mean by a thread-local buffer (in native code). In Java this is pretty standard, but I couldn't find any implementation for C code. Here is the terror function: const char* terror(int errnum) { if ((errnum