[jira] [Commented] (HADOOP-11505) hadoop-mapreduce-client-nativetask fails to use x86 optimizations in some cases
[ https://issues.apache.org/jira/browse/HADOOP-11505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545322#comment-14545322 ] Binglin Chang commented on HADOOP-11505: Yes, that's what HADOOP-11665 did hadoop-mapreduce-client-nativetask fails to use x86 optimizations in some cases --- Key: HADOOP-11505 URL: https://issues.apache.org/jira/browse/HADOOP-11505 Project: Hadoop Common Issue Type: Bug Affects Versions: 3.0.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Labels: BB2015-05-TBR Attachments: HADOOP-11505.001.patch hadoop-mapreduce-client-nativetask fails to use x86 optimizations in some cases. Also, on some alternate, non-x86, non-ARM architectures the generated code is incorrect. Thanks to Steve Loughran and Edward Nevill for finding this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11665) Provide and unify cross platform byteorder support in native code
[ https://issues.apache.org/jira/browse/HADOOP-11665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359756#comment-14359756 ] Binglin Chang commented on HADOOP-11665: Looks like [~Ayappan] mark this as blocker, maybe he require this as bugfix? Provide and unify cross platform byteorder support in native code - Key: HADOOP-11665 URL: https://issues.apache.org/jira/browse/HADOOP-11665 Project: Hadoop Common Issue Type: Bug Components: util Affects Versions: 2.4.1, 2.6.0 Environment: PowerPC Big Endian other Big Endian platforms Reporter: Binglin Chang Assignee: Binglin Chang Priority: Blocker Attachments: HADOOP-11665.001.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11665) Provide and unify cross platform byteorder support in native code
[ https://issues.apache.org/jira/browse/HADOOP-11665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HADOOP-11665: --- Priority: Minor (was: Blocker) Provide and unify cross platform byteorder support in native code - Key: HADOOP-11665 URL: https://issues.apache.org/jira/browse/HADOOP-11665 Project: Hadoop Common Issue Type: Bug Components: util Affects Versions: 2.4.1, 2.6.0 Environment: PowerPC Big Endian other Big Endian platforms Reporter: Binglin Chang Assignee: Binglin Chang Priority: Minor Attachments: HADOOP-11665.001.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-10846) DataChecksum#calculateChunkedSums not working for PPC when buffers not backed by array
[ https://issues.apache.org/jira/browse/HADOOP-10846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344759#comment-14344759 ] Binglin Chang commented on HADOOP-10846: Hi Ayappan, thanks for the patch, when validating on macosx, got compile error like: {code} [exec] Building C object CMakeFiles/hadoop.dir/main/native/src/org/apache/hadoop/util/bulk_crc32.c.o [exec] /usr/bin/cc -Dhadoop_EXPORTS -g -/Volumes/SSD/projects/hadoop-trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/util/bulk_crc32.c:3Wall -O2 -D_REENTRANT -D_GNU_SOURCE -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -isysroot /Applications/Xcode.app/Contents/Develo4:10: fatal error: 'byteswap.h' file not found [exec] #include byteswap.h [exec] ^ [exec] 1 error generated. [exec] make[2]: *** [CMakeFiles/hadoop.dir/main/native/src/org/apache/hadoop/util/bulk_crc32.c.o] Error 1 [exec] make[1]: *** [CMakeFiles/hadoop.dir/all] Error 2 [exec] make: *** [all] Error 2 {code} I think a more standard way of handling byteorder stuff is needed(not only in this jira), like google did in many open sourced code bases: https://github.com/google/flatbuffers/blob/master/include/flatbuffers/flatbuffers.h DataChecksum#calculateChunkedSums not working for PPC when buffers not backed by array -- Key: HADOOP-10846 URL: https://issues.apache.org/jira/browse/HADOOP-10846 Project: Hadoop Common Issue Type: Bug Components: util Affects Versions: 2.4.1, 2.5.2 Environment: PowerPC platform Reporter: Jinghui Wang Assignee: Ayappan Attachments: HADOOP-10846-v1.patch, HADOOP-10846-v2.patch, HADOOP-10846-v3.patch, HADOOP-10846-v4.patch, HADOOP-10846.patch Got the following exception when running Hadoop on Power PC. The implementation for computing checksum when the data buffer and checksum buffer are not backed by arrays. 13/09/16 04:06:57 ERROR security.UserGroupInformation: PriviledgedActionException as:biadmin (auth:SIMPLE) cause:org.apache.hadoop.ipc.RemoteException(java.io.IOException): org.apache.hadoop.fs.ChecksumException: Checksum error -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11665) Provide and unify cross platform byteorder support in native code
[ https://issues.apache.org/jira/browse/HADOOP-11665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HADOOP-11665: --- Assignee: Binglin Chang Status: Patch Available (was: Open) Provide and unify cross platform byteorder support in native code - Key: HADOOP-11665 URL: https://issues.apache.org/jira/browse/HADOOP-11665 Project: Hadoop Common Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11665) Provide and unify cross platform byteorder support in native code
[ https://issues.apache.org/jira/browse/HADOOP-11665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344983#comment-14344983 ] Binglin Chang commented on HADOOP-11665: links to related jiras Provide and unify cross platform byteorder support in native code - Key: HADOOP-11665 URL: https://issues.apache.org/jira/browse/HADOOP-11665 Project: Hadoop Common Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Attachments: HADOOP-11665.001.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11665) Provide and unify cross platform byteorder support in native code
[ https://issues.apache.org/jira/browse/HADOOP-11665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HADOOP-11665: --- Attachment: HADOOP-11665.001.patch The idea is mostly borrowed from https://github.com/google/flatbuffers/blob/master/include/flatbuffers/flatbuffers.h Compile passed on macosx and ubuntu, no ppc environment to test this. Provide and unify cross platform byteorder support in native code - Key: HADOOP-11665 URL: https://issues.apache.org/jira/browse/HADOOP-11665 Project: Hadoop Common Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Attachments: HADOOP-11665.001.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-11665) Provide and unify cross platform byteorder support in native code
Binglin Chang created HADOOP-11665: -- Summary: Provide and unify cross platform byteorder support in native code Key: HADOOP-11665 URL: https://issues.apache.org/jira/browse/HADOOP-11665 Project: Hadoop Common Issue Type: Bug Reporter: Binglin Chang -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11505) hadoop-mapreduce-client-nativetask fails to use x86 optimizations in some cases
[ https://issues.apache.org/jira/browse/HADOOP-11505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288833#comment-14288833 ] Binglin Chang commented on HADOOP-11505: Hi Colin, thanks for working on this. More background about this issue: 1. The nativetask code is only optimized for x86_64, so some function names are not ideal. like the name bswap are little confusing, it is used for ntoh purpose. 2. so on all bigendian arch, bswap should be nop. 3. I guess the use of inline assembly is because very old compilers(like gcc3 I use back at baidu) don't optimize bit shift style ntoh function as bswap asembly, but most compilers do that now, so not sure the inline assembly is needed any more. So I think a more clean way to fix this is to define ntoh32 and noth64 properly(use bit shift rather than assembly should be OK for modern compilers) based on BYTE_ORDER macro, and replace all bswap. hadoop-mapreduce-client-nativetask fails to use x86 optimizations in some cases --- Key: HADOOP-11505 URL: https://issues.apache.org/jira/browse/HADOOP-11505 Project: Hadoop Common Issue Type: Bug Affects Versions: 3.0.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HADOOP-11505.001.patch hadoop-mapreduce-client-nativetask fails to use x86 optimizations in some cases. Also, on some alternate, non-x86, non-ARM architectures the generated code is incorrect. Thanks to Steve Loughran and Edward Nevill for finding this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11154) Update BUILDING.txt to state that CMake 3.0 or newer is required on Mac.
[ https://issues.apache.org/jira/browse/HADOOP-11154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152747#comment-14152747 ] Binglin Chang commented on HADOOP-11154: Hi Chris, which problem when build using cmake 2.6? I am using cmake 2.8 and macos10.9, and can build current trunk successfully. Update BUILDING.txt to state that CMake 3.0 or newer is required on Mac. Key: HADOOP-11154 URL: https://issues.apache.org/jira/browse/HADOOP-11154 Project: Hadoop Common Issue Type: Bug Components: documentation, native Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Trivial Attachments: HADOOP-11154.1.patch The native code can be built on Mac now, but CMake 3.0 or newer is required. This differs from our minimum stated version of 2.6 in BUILDING.txt. I'd like to update BUILDING.txt to state that 3.0 or newer is required if building on Mac. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11154) Update BUILDING.txt to state that CMake 3.0 or newer is required on Mac.
[ https://issues.apache.org/jira/browse/HADOOP-11154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152814#comment-14152814 ] Binglin Chang commented on HADOOP-11154: Ooh, java1.7 I'm currently using java1.6 so that's why cmake2.8 works for me. Thanks for the explanation, please commit. Update BUILDING.txt to state that CMake 3.0 or newer is required on Mac. Key: HADOOP-11154 URL: https://issues.apache.org/jira/browse/HADOOP-11154 Project: Hadoop Common Issue Type: Bug Components: documentation, native Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Trivial Attachments: HADOOP-11154.1.patch The native code can be built on Mac now, but CMake 3.0 or newer is required. This differs from our minimum stated version of 2.6 in BUILDING.txt. I'd like to update BUILDING.txt to state that 3.0 or newer is required if building on Mac. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11132) checkHadoopHome still uses HADOOP_HOME
[ https://issues.apache.org/jira/browse/HADOOP-11132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14150930#comment-14150930 ] Binglin Chang commented on HADOOP-11132: Hi Tsuyoshi, thank for the patch. Grep though the code, looks like checkHadoopHome is only used in windows env right now, I am not sure windows is change to HADOOP_PREFIX from HADOOP_HOME right now, at least the .cmd files are still have plenty of HADOOP_HOME vars. Maybe somebody familiar with hadoop windows version can have a look? Or have it tested on windows? checkHadoopHome still uses HADOOP_HOME -- Key: HADOOP-11132 URL: https://issues.apache.org/jira/browse/HADOOP-11132 Project: Hadoop Common Issue Type: Bug Reporter: Allen Wittenauer Attachments: HADOOP-11132.1.patch It should be using HADOOP_PREFIX. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-10389) Native RPCv9 client
[ https://issues.apache.org/jira/browse/HADOOP-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033486#comment-14033486 ] Binglin Chang commented on HADOOP-10389: I'd like to add more inputs on this: I wrote a c++ rpc/hdfs/yarn client(https://github.com/decster/libhadoopclient), it uses c++11, so it does not need boost(although many people use boost, they just for header only libraries, and public headers does not include boost, so there are no version issues). c++ 's main concern is abi compatibility, this can be resolved by using c or simple c++ class public headers, hiding real implementation. I think some issue using c++/pro using c are: 1. centos does not have enough support for c++11, c++11 is not generally available yet 2. remain libhdfs compatibility, since libhdfs is written in c, we might just continue using c as well Also there are some concerns about using c: 1. the protobuf-c library is just not so reliable as official protobuf library which is maintained and verified by google and many other companies/projects, I read some of the protobuf-c code, it uses a reflection style implementation to do serializing/deserializing, so performance, security, compatibility may all at risk. I see https://github.com/protobuf-c/protobuf-c only have 92 stars. 2. malloc/free/memset can easily generate buggy code, need additional care and checks, I see many those kinds of code recently(HDFS-6534,HADOOP-10640,HADOOP-10706). it is OK to use c, but we may need more care and effort. About JNIFS, why do we need jnifs if we already have nativefs? using dlopen/dlsym to replace jni apis is not trivial if both compile/runtime dependency is needed to be removed. Native RPCv9 client --- Key: HADOOP-10389 URL: https://issues.apache.org/jira/browse/HADOOP-10389 Project: Hadoop Common Issue Type: Sub-task Affects Versions: HADOOP-10388 Reporter: Binglin Chang Assignee: Colin Patrick McCabe Attachments: HADOOP-10388.001.patch, HADOOP-10389-alternative.000.patch, HADOOP-10389.002.patch, HADOOP-10389.004.patch, HADOOP-10389.005.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10636) Native Hadoop Client:add unit test case for callclient_id
[ https://issues.apache.org/jira/browse/HADOOP-10636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034770#comment-14034770 ] Binglin Chang commented on HADOOP-10636: lgtm, +1, with minor formatting changes Native Hadoop Client:add unit test case for callclient_id -- Key: HADOOP-10636 URL: https://issues.apache.org/jira/browse/HADOOP-10636 Project: Hadoop Common Issue Type: Sub-task Affects Versions: HADOOP-10388 Reporter: Wenwu Peng Assignee: Wenwu Peng Attachments: HADOOP-10636-pnative.001.patch, HADOOP-10636-pnative.002.patch, HADOOP-10636-pnative.003.patch, HADOOP-10636-pnative.004.patch, HADOOP-10636-pnative.005.patch, HADOOP-10636-pnative.006.patch, HADOOP-10636-pnative.007.patch, HADOOP-10636-pnative.008.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10636) Native Hadoop Client:add unit test case for callclient_id
[ https://issues.apache.org/jira/browse/HADOOP-10636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HADOOP-10636: --- Attachment: HADOOP-10636-pnative.008-commit.patch committed. Thanks, Wenwu. Native Hadoop Client:add unit test case for callclient_id -- Key: HADOOP-10636 URL: https://issues.apache.org/jira/browse/HADOOP-10636 Project: Hadoop Common Issue Type: Sub-task Affects Versions: HADOOP-10388 Reporter: Wenwu Peng Assignee: Wenwu Peng Attachments: HADOOP-10636-pnative.001.patch, HADOOP-10636-pnative.002.patch, HADOOP-10636-pnative.003.patch, HADOOP-10636-pnative.004.patch, HADOOP-10636-pnative.005.patch, HADOOP-10636-pnative.006.patch, HADOOP-10636-pnative.007.patch, HADOOP-10636-pnative.008-commit.patch, HADOOP-10636-pnative.008.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HADOOP-10636) Native Hadoop Client:add unit test case for callclient_id
[ https://issues.apache.org/jira/browse/HADOOP-10636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang resolved HADOOP-10636. Resolution: Fixed Fix Version/s: HADOOP-10388 Native Hadoop Client:add unit test case for callclient_id -- Key: HADOOP-10636 URL: https://issues.apache.org/jira/browse/HADOOP-10636 Project: Hadoop Common Issue Type: Sub-task Affects Versions: HADOOP-10388 Reporter: Wenwu Peng Assignee: Wenwu Peng Fix For: HADOOP-10388 Attachments: HADOOP-10636-pnative.001.patch, HADOOP-10636-pnative.002.patch, HADOOP-10636-pnative.003.patch, HADOOP-10636-pnative.004.patch, HADOOP-10636-pnative.005.patch, HADOOP-10636-pnative.006.patch, HADOOP-10636-pnative.007.patch, HADOOP-10636-pnative.008-commit.patch, HADOOP-10636-pnative.008.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10699) Fix build native library on mac osx
[ https://issues.apache.org/jira/browse/HADOOP-10699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HADOOP-10699: --- Attachment: HADOOP-10699-common.v3.patch Fix build native library on mac osx --- Key: HADOOP-10699 URL: https://issues.apache.org/jira/browse/HADOOP-10699 Project: Hadoop Common Issue Type: Bug Reporter: Kirill A. Korinskiy Assignee: Binglin Chang Attachments: HADOOP-10699-common.v3.patch, HADOOP-9648-native-osx.1.0.4.patch, HADOOP-9648-native-osx.1.1.2.patch, HADOOP-9648-native-osx.1.2.0.patch, HADOOP-9648-native-osx.2.0.5-alpha-rc1.patch, HADOOP-9648.v2.patch Some patches for fixing build a hadoop native library on os x 10.7/10.8. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HADOOP-10706) fix some bug related to hrpc_sync_ctx
Binglin Chang created HADOOP-10706: -- Summary: fix some bug related to hrpc_sync_ctx Key: HADOOP-10706 URL: https://issues.apache.org/jira/browse/HADOOP-10706 Project: Hadoop Common Issue Type: Sub-task Reporter: Binglin Chang Assignee: Binglin Chang 1. {code} memset(ctx, 0, sizeof(ctx)); return ctx; {code} Doing this will alway make return value to 0 2. hrpc_release_sync_ctx should changed to hrpc_proxy_release_sync_ctx, all the functions in this .h/.c file follow this rule -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10706) fix some bug related to hrpc_sync_ctx
[ https://issues.apache.org/jira/browse/HADOOP-10706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HADOOP-10706: --- Attachment: HADOOP-10706.v1.patch fix some bug related to hrpc_sync_ctx - Key: HADOOP-10706 URL: https://issues.apache.org/jira/browse/HADOOP-10706 Project: Hadoop Common Issue Type: Sub-task Reporter: Binglin Chang Assignee: Binglin Chang Attachments: HADOOP-10706.v1.patch 1. {code} memset(ctx, 0, sizeof(ctx)); return ctx; {code} Doing this will alway make return value to 0 2. hrpc_release_sync_ctx should changed to hrpc_proxy_release_sync_ctx, all the functions in this .h/.c file follow this rule -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10668) TestZKFailoverControllerStress#testExpireBackAndForth occasionally fails
[ https://issues.apache.org/jira/browse/HADOOP-10668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032269#comment-14032269 ] Binglin Chang commented on HADOOP-10668: The test failed again in HDFS-5574 https://builds.apache.org/job/PreCommit-HDFS-Build/7127//testReport/org.apache.hadoop.ha/TestZKFailoverControllerStress/testExpireBackAndForth/ TestZKFailoverControllerStress#testExpireBackAndForth occasionally fails Key: HADOOP-10668 URL: https://issues.apache.org/jira/browse/HADOOP-10668 Project: Hadoop Common Issue Type: Test Reporter: Ted Yu Priority: Minor Labels: test From https://builds.apache.org/job/PreCommit-HADOOP-Build/4018//testReport/org.apache.hadoop.ha/TestZKFailoverControllerStress/testExpireBackAndForth/ : {code} org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode at org.apache.zookeeper.server.DataTree.getData(DataTree.java:648) at org.apache.zookeeper.server.ZKDatabase.getData(ZKDatabase.java:371) at org.apache.hadoop.ha.MiniZKFCCluster.expireActiveLockHolder(MiniZKFCCluster.java:199) at org.apache.hadoop.ha.MiniZKFCCluster.expireAndVerifyFailover(MiniZKFCCluster.java:234) at org.apache.hadoop.ha.TestZKFailoverControllerStress.testExpireBackAndForth(TestZKFailoverControllerStress.java:84) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Moved] (HADOOP-10699) Fix build native library on mac osx
[ https://issues.apache.org/jira/browse/HADOOP-10699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang moved YARN-2160 to HADOOP-10699: -- Key: HADOOP-10699 (was: YARN-2160) Project: Hadoop Common (was: Hadoop YARN) Fix build native library on mac osx --- Key: HADOOP-10699 URL: https://issues.apache.org/jira/browse/HADOOP-10699 Project: Hadoop Common Issue Type: Bug Reporter: Kirill A. Korinskiy Assignee: Binglin Chang Attachments: HADOOP-9648-native-osx.1.0.4.patch, HADOOP-9648-native-osx.1.1.2.patch, HADOOP-9648-native-osx.1.2.0.patch, HADOOP-9648-native-osx.2.0.5-alpha-rc1.patch, HADOOP-9648.v2.patch Some patches for fixing build a hadoop native library on os x 10.7/10.8. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HADOOP-10700) Fix build on macosx: HDFS parts
Binglin Chang created HADOOP-10700: -- Summary: Fix build on macosx: HDFS parts Key: HADOOP-10700 URL: https://issues.apache.org/jira/browse/HADOOP-10700 Project: Hadoop Common Issue Type: Sub-task Reporter: Binglin Chang Assignee: Binglin Chang Priority: Minor When compiling native code on macosx using clang, compiler find more warning and errors which gcc ignores, those should be fixed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10700) Fix build on macosx: HDFS parts
[ https://issues.apache.org/jira/browse/HADOOP-10700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HADOOP-10700: --- Issue Type: Bug (was: Sub-task) Parent: (was: HADOOP-10699) Fix build on macosx: HDFS parts --- Key: HADOOP-10700 URL: https://issues.apache.org/jira/browse/HADOOP-10700 Project: Hadoop Common Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Priority: Minor When compiling native code on macosx using clang, compiler find more warning and errors which gcc ignores, those should be fixed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10699) Fix build native library on mac osx
[ https://issues.apache.org/jira/browse/HADOOP-10699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031628#comment-14031628 ] Binglin Chang commented on HADOOP-10699: Created HDFS-6534 for HDFS parts, and YARN-2161 for YARN parts, this jira is changed to on cover Common parts Fix build native library on mac osx --- Key: HADOOP-10699 URL: https://issues.apache.org/jira/browse/HADOOP-10699 Project: Hadoop Common Issue Type: Bug Reporter: Kirill A. Korinskiy Assignee: Binglin Chang Attachments: HADOOP-9648-native-osx.1.0.4.patch, HADOOP-9648-native-osx.1.1.2.patch, HADOOP-9648-native-osx.1.2.0.patch, HADOOP-9648-native-osx.2.0.5-alpha-rc1.patch, HADOOP-9648.v2.patch Some patches for fixing build a hadoop native library on os x 10.7/10.8. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-9648) Fix build native library on mac osx
[ https://issues.apache.org/jira/browse/HADOOP-9648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026076#comment-14026076 ] Binglin Chang commented on HADOOP-9648: --- Hi [~vinodkv] or [~jlowe], I see the the latest container-executor changes are done by you, currently the main concern of this jira is yarn related native code changes, you seems the right person to ask for help, could you give some comments about this? Fix build native library on mac osx --- Key: HADOOP-9648 URL: https://issues.apache.org/jira/browse/HADOOP-9648 Project: Hadoop Common Issue Type: Bug Affects Versions: 1.0.4, 1.2.0, 1.1.2, 2.0.5-alpha Reporter: Kirill A. Korinskiy Assignee: Binglin Chang Attachments: HADOOP-9648-native-osx.1.0.4.patch, HADOOP-9648-native-osx.1.1.2.patch, HADOOP-9648-native-osx.1.2.0.patch, HADOOP-9648-native-osx.2.0.5-alpha-rc1.patch, HADOOP-9648.v2.patch Some patches for fixing build a hadoop native library on os x 10.7/10.8. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10640) Implement Namenode RPCs in HDFS native client
[ https://issues.apache.org/jira/browse/HADOOP-10640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14021649#comment-14021649 ] Binglin Chang commented on HADOOP-10640: sizeof(struct hrpc_proxy) always larger than RPC_PROXY_USERDATA_MAX? {code} void *hrpc_proxy_alloc_userdata(struct hrpc_proxy *proxy, size_t size) { if (size RPC_PROXY_USERDATA_MAX) { return NULL; } return proxy-userdata; } struct hrpc_sync_ctx *hrpc_proxy_alloc_sync_ctx(struct hrpc_proxy *proxy) { struct hrpc_sync_ctx *ctx = hrpc_proxy_alloc_userdata(proxy, sizeof(struct hrpc_proxy)); if (!ctx) { return NULL; } if (uv_sem_init(ctx-sem, 0)) { return NULL; } memset(ctx, 0, sizeof(ctx)); return ctx; } {code} Implement Namenode RPCs in HDFS native client - Key: HADOOP-10640 URL: https://issues.apache.org/jira/browse/HADOOP-10640 Project: Hadoop Common Issue Type: Sub-task Components: native Affects Versions: HADOOP-10388 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HADOOP-10640-pnative.001.patch, HADOOP-10640-pnative.002.patch, HADOOP-10640-pnative.003.patch Implement the parts of libhdfs that just involve making RPCs to the Namenode, such as mkdir, rename, etc. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10640) Implement Namenode RPCs in HDFS native client
[ https://issues.apache.org/jira/browse/HADOOP-10640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018861#comment-14018861 ] Binglin Chang commented on HADOOP-10640: In hdfs.h: {code} #if defined(unix) || defined(__MACH__) {code} bq. I don't really like the typedefs. They make it hard to forward-declare structures in header files. I see some methods in ndfs/jnifs use typedef, some use structs, as long as they are uniform in impls. Implement Namenode RPCs in HDFS native client - Key: HADOOP-10640 URL: https://issues.apache.org/jira/browse/HADOOP-10640 Project: Hadoop Common Issue Type: Sub-task Components: native Affects Versions: HADOOP-10388 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HADOOP-10640-pnative.001.patch, HADOOP-10640-pnative.002.patch Implement the parts of libhdfs that just involve making RPCs to the Namenode, such as mkdir, rename, etc. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10640) Implement Namenode RPCs in HDFS native client
[ https://issues.apache.org/jira/browse/HADOOP-10640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14016318#comment-14016318 ] Binglin Chang commented on HADOOP-10640: Thanks for the patch Colin. Have not finished reviewing, some comments: CMakeLists.txt: #add_subdirectory(fs) fs/CMakeLists.txt redundant? CMakeLists.txt: #include(Libhdfs.cmake) do not have this file CMakeList.txt: -fvisibility=hidden macosx also support this(if (${CMAKE_SYSTEM_NAME} MATCHES Darwin) to detected), you can add this or I can add it later. fs/fs.h:136 use hdfsFile/hdfsFs instead of struct hdfsFile_internal */struct hdfs_internal ? config.h.cmake HCONF_XML_TEST_PATH we can set CLASSPATH env in tests, it better than static config macro when compiling(looks like clang can find more code bugs than gcc...): should add ${JNI_INCLUDE_DIRS} in include_directories: In file included from /Users/decster/projects/hadoop-trunk/hadoop-native-core/test/native_mini_dfs.c:21: /Users/decster/projects/hadoop-trunk/hadoop-native-core/jni/exception.h:37:10: fatal error: 'jni.h' file not found #include jni.h should unified to tTime: /Users/decster/projects/hadoop-trunk/hadoop-native-core/ndfs/ndfs.c:1055:14: warning: incompatible pointer types initializing 'int (*)(struct hdfs_internal *, const char *, int64_t, int64_t)' with an expression of type 'int (hdfsFS, const char *, tTime, tTime)' [-Wincompatible-pointer-types] .utime = ndfs_utime, ^~ wrong memset usage: /Users/decster/projects/hadoop-trunk/hadoop-native-core/fs/common.c:39:36: warning: 'memset' call operates on objects of type 'hdfsFileInfo' (aka 'struct file_info') while the size is based on a different type 'hdfsFileInfo *' (aka 'struct file_info *') [-Wsizeof-pointer-memaccess] memset(hdfsFileInfo, 0, sizeof(hdfsFileInfo)); ^~~~ /Users/decster/projects/hadoop-trunk/hadoop-native-core/rpc/proxy.c:102:27: warning: 'memset' call operates on objects of type 'struct hrpc_sync_ctx' while the size is based on a different type 'struct hrpc_sync_ctx *' [-Wsizeof-pointer-memaccess] memset(ctx, 0, sizeof(ctx)); ~~~^~~ Implement Namenode RPCs in HDFS native client - Key: HADOOP-10640 URL: https://issues.apache.org/jira/browse/HADOOP-10640 Project: Hadoop Common Issue Type: Sub-task Components: native Affects Versions: HADOOP-10388 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HADOOP-10640-pnative.001.patch Implement the parts of libhdfs that just involve making RPCs to the Namenode, such as mkdir, rename, etc. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10640) Implement Namenode RPCs in HDFS native client
[ https://issues.apache.org/jira/browse/HADOOP-10640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14016319#comment-14016319 ] Binglin Chang commented on HADOOP-10640: bad format... re submit comments. {noformat} CMakeLists.txt: #add_subdirectory(fs) fs/CMakeLists.txt redundant? CMakeLists.txt: #include(Libhdfs.cmake) do not have this file CMakeList.txt: -fvisibility=hidden macosx also support this(if (${CMAKE_SYSTEM_NAME} MATCHES Darwin) to detected), you can add this or I can add it later. fs/fs.h:136 use hdfsFile/hdfsFs instead of struct hdfsFile_internal */struct hdfs_internal ? config.h.cmake HCONF_XML_TEST_PATH we can set CLASSPATH env in tests, it better than static config macro {noformat} when compiling(looks like clang can find more code bugs than gcc...): {noformat} should add ${JNI_INCLUDE_DIRS} in include_directories: In file included from /Users/decster/projects/hadoop-trunk/hadoop-native-core/test/native_mini_dfs.c:21: /Users/decster/projects/hadoop-trunk/hadoop-native-core/jni/exception.h:37:10: fatal error: 'jni.h' file not found #include jni.h {noformat} should unified to tTime: {noformat} /Users/decster/projects/hadoop-trunk/hadoop-native-core/ndfs/ndfs.c:1055:14: warning: incompatible pointer types initializing 'int (*)(struct hdfs_internal *, const char *, int64_t, int64_t)' with an expression of type 'int (hdfsFS, const char *, tTime, tTime)' [-Wincompatible-pointer-types] .utime = ndfs_utime, ^~ {noformat} wrong memset usage: {noformat} /Users/decster/projects/hadoop-trunk/hadoop-native-core/fs/common.c:39:36: warning: 'memset' call operates on objects of type 'hdfsFileInfo' (aka 'struct file_info') while the size is based on a different type 'hdfsFileInfo *' (aka 'struct file_info *') [-Wsizeof-pointer-memaccess] memset(hdfsFileInfo, 0, sizeof(hdfsFileInfo)); ^~~~ /Users/decster/projects/hadoop-trunk/hadoop-native-core/rpc/proxy.c:102:27: warning: 'memset' call operates on objects of type 'struct hrpc_sync_ctx' while the size is based on a different type 'struct hrpc_sync_ctx *' [-Wsizeof-pointer-memaccess] memset(ctx, 0, sizeof(ctx)); ~~~^~~ {noformat} Implement Namenode RPCs in HDFS native client - Key: HADOOP-10640 URL: https://issues.apache.org/jira/browse/HADOOP-10640 Project: Hadoop Common Issue Type: Sub-task Components: native Affects Versions: HADOOP-10388 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HADOOP-10640-pnative.001.patch Implement the parts of libhdfs that just involve making RPCs to the Namenode, such as mkdir, rename, etc. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10631) Native Hadoop Client: make clean should remove pb-c.h.s files
[ https://issues.apache.org/jira/browse/HADOOP-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012125#comment-14012125 ] Binglin Chang commented on HADOOP-10631: Thanks for the review, Colin. Native Hadoop Client: make clean should remove pb-c.h.s files - Key: HADOOP-10631 URL: https://issues.apache.org/jira/browse/HADOOP-10631 Project: Hadoop Common Issue Type: Sub-task Affects Versions: HADOOP-10388 Reporter: Binglin Chang Assignee: Binglin Chang Priority: Trivial Fix For: HADOOP-10388 Attachments: HADOOP-10631.v1.patch In GenerateProtobufs.cmake, pb-c.h.s files are not added to output, so when make clean is called, those files are not cleaned. {code} add_custom_command( OUTPUT ${PB_C_FILE} ${PB_H_FILE} ${CALL_C_FILE} ${CALL_H_FILE} {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10444) add pom.xml infrastructure for hadoop-native-core
[ https://issues.apache.org/jira/browse/HADOOP-10444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HADOOP-10444: --- Attachment: HADOOP-10444.v1.patch Changes: 1. change project structure to maven, move all code from hadoop-native-core to hadoop-native-core/src/main/native 2. change build dir from hadoop-native-core to hadoop-native-core/target/native 3. add pom artifact hadoop-native-core, make according changes in hadoop-dist/pom.xml hadoop-project/pom.xml and pom.xml 4. add a new assembly hadoop-native-core-dist to copy .h files only. 5. add a new profile: native-core to activate hadoop-native-core compile/test/package, I didn't reuse -Pnative, cause currently -Pnative can't work on MacOSX. When invoke using: mvn package -Pdist -DskipTests -Pnative -Pnative-core native client libraries are packaged in following locations: {code} decster@localhost:~/hadoop-trunk$ ll hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/{include/common,include/rpc,lib/native} hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/include/common: total 76 -rw-rw-r-- 1 decster decster 2380 2014-05-29 00:10 hadoop_err.h -rw-rw-r-- 1 decster decster 1025 2014-05-29 00:10 net.h -rw-rw-r-- 1 decster decster 23781 2014-05-29 00:10 queue.h -rw-rw-r-- 1 decster decster 1349 2014-05-29 00:10 string.h -rw-rw-r-- 1 decster decster 4936 2014-05-29 00:10 test.h -rw-rw-r-- 1 decster decster 25776 2014-05-29 00:10 tree.h -rw-rw-r-- 1 decster decster 1189 2014-05-29 00:10 user.h hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/include/rpc: total 36 -rw-rw-r-- 1 decster decster 3479 2014-05-29 00:10 call.h -rw-rw-r-- 1 decster decster 1561 2014-05-29 00:10 client_id.h -rw-rw-r-- 1 decster decster 6392 2014-05-29 00:10 conn.h -rw-rw-r-- 1 decster decster 2939 2014-05-29 00:10 messenger.h -rw-rw-r-- 1 decster decster 5395 2014-05-29 00:10 proxy.h -rw-rw-r-- 1 decster decster 3700 2014-05-29 00:10 reactor.h -rw-rw-r-- 1 decster decster 2647 2014-05-29 00:10 varint.h hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/lib/native: total 7464 -rw-rw-r-- 1 decster decster 1239398 2014-05-29 00:10 libhadoop.a -rw-rw-r-- 1 decster decster 1374292 2014-05-29 00:10 libhadooppipes.a lrwxrwxrwx 1 decster decster 18 2014-05-29 00:10 libhadoop.so - libhadoop.so.1.0.0* -rwxrwxr-x 1 decster decster 721343 2014-05-29 00:10 libhadoop.so.1.0.0* -rw-rw-r-- 1 decster decster 453186 2014-05-29 00:10 libhadooputils.a -rw-rw-r-- 1 decster decster 317370 2014-05-29 00:10 libhdfs.a lrwxrwxrwx 1 decster decster 17 2014-05-29 00:10 libhdfs-core.so - libhdfs-core.so.1* lrwxrwxrwx 1 decster decster 21 2014-05-29 00:10 libhdfs-core.so.1 - libhdfs-core.so.1.0.0* -rwxrwxr-x 1 decster decster 2747719 2014-05-29 00:10 libhdfs-core.so.1.0.0* lrwxrwxrwx 1 decster decster 16 2014-05-29 00:10 libhdfs.so - libhdfs.so.0.0.0* -rwxrwxr-x 1 decster decster 212967 2014-05-29 00:10 libhdfs.so.0.0.0* lrwxrwxrwx 1 decster decster 17 2014-05-29 00:10 libyarn-core.so - libyarn-core.so.1* lrwxrwxrwx 1 decster decster 21 2014-05-29 00:10 libyarn-core.so.1 - libyarn-core.so.1.0.0* -rwxrwxr-x 1 decster decster 564923 2014-05-29 00:10 libyarn-core.so.1.0.0* {code} add pom.xml infrastructure for hadoop-native-core - Key: HADOOP-10444 URL: https://issues.apache.org/jira/browse/HADOOP-10444 Project: Hadoop Common Issue Type: Sub-task Reporter: Colin Patrick McCabe Assignee: Binglin Chang Attachments: HADOOP-10444.v1.patch Add pom.xml infrastructure for hadoop-native-core, so that it builds under Maven. We can look to how we integrated CMake into hadoop-hdfs-project and hadoop-common-project for inspiration here. In the long term, it would be nice to use a Maven plugin here (see HADOOP-8887) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10444) add pom.xml infrastructure for hadoop-native-core
[ https://issues.apache.org/jira/browse/HADOOP-10444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012166#comment-14012166 ] Binglin Chang commented on HADOOP-10444: Hi Colin, the patch moves all the code, so it's better to get this review and committed soon to avoid conflicts. add pom.xml infrastructure for hadoop-native-core - Key: HADOOP-10444 URL: https://issues.apache.org/jira/browse/HADOOP-10444 Project: Hadoop Common Issue Type: Sub-task Reporter: Colin Patrick McCabe Assignee: Binglin Chang Attachments: HADOOP-10444.v1.patch Add pom.xml infrastructure for hadoop-native-core, so that it builds under Maven. We can look to how we integrated CMake into hadoop-hdfs-project and hadoop-common-project for inspiration here. In the long term, it would be nice to use a Maven plugin here (see HADOOP-8887) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10444) add pom.xml infrastructure for hadoop-native-core
[ https://issues.apache.org/jira/browse/HADOOP-10444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HADOOP-10444: --- Attachment: HADOOP-10444.v2.patch Thanks for the review Colin. I updated the patch addressing your comments. Changes: 1. change -Pnative-core to -Pnative 2. remove assembly, we can add hdfs.h(or just share old one) and yarn.h latter when we have them. add pom.xml infrastructure for hadoop-native-core - Key: HADOOP-10444 URL: https://issues.apache.org/jira/browse/HADOOP-10444 Project: Hadoop Common Issue Type: Sub-task Reporter: Colin Patrick McCabe Assignee: Binglin Chang Attachments: HADOOP-10444.v1.patch, HADOOP-10444.v2.patch Add pom.xml infrastructure for hadoop-native-core, so that it builds under Maven. We can look to how we integrated CMake into hadoop-hdfs-project and hadoop-common-project for inspiration here. In the long term, it would be nice to use a Maven plugin here (see HADOOP-8887) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10444) add pom.xml infrastructure for hadoop-native-core
[ https://issues.apache.org/jira/browse/HADOOP-10444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013311#comment-14013311 ] Binglin Chang commented on HADOOP-10444: bq. If there is stuff that doesn't work on MacOS, we should just fix that stuff. This reminds me of HADOOP-9648, as you have done lot of native library stuff, could you help take a look? add pom.xml infrastructure for hadoop-native-core - Key: HADOOP-10444 URL: https://issues.apache.org/jira/browse/HADOOP-10444 Project: Hadoop Common Issue Type: Sub-task Reporter: Colin Patrick McCabe Assignee: Binglin Chang Attachments: HADOOP-10444.v1.patch, HADOOP-10444.v2.patch Add pom.xml infrastructure for hadoop-native-core, so that it builds under Maven. We can look to how we integrated CMake into hadoop-hdfs-project and hadoop-common-project for inspiration here. In the long term, it would be nice to use a Maven plugin here (see HADOOP-8887) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HADOOP-10631) Native Hadoop Client: Add missing output in GenerateProtobufs.cmake
Binglin Chang created HADOOP-10631: -- Summary: Native Hadoop Client: Add missing output in GenerateProtobufs.cmake Key: HADOOP-10631 URL: https://issues.apache.org/jira/browse/HADOOP-10631 Project: Hadoop Common Issue Type: Sub-task Reporter: Binglin Chang Assignee: Binglin Chang Priority: Trivial In GenerateProtobufs.cmake, pb-c.h.s files are not added to output, so when make clean is called, those files are not cleaned. {code} add_custom_command( OUTPUT ${PB_C_FILE} ${PB_H_FILE} ${CALL_C_FILE} ${CALL_H_FILE} {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10631) Native Hadoop Client: Add missing output in GenerateProtobufs.cmake
[ https://issues.apache.org/jira/browse/HADOOP-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HADOOP-10631: --- Affects Version/s: HADOOP-10388 Native Hadoop Client: Add missing output in GenerateProtobufs.cmake --- Key: HADOOP-10631 URL: https://issues.apache.org/jira/browse/HADOOP-10631 Project: Hadoop Common Issue Type: Sub-task Affects Versions: HADOOP-10388 Reporter: Binglin Chang Assignee: Binglin Chang Priority: Trivial Attachments: HADOOP-10631.v1.patch In GenerateProtobufs.cmake, pb-c.h.s files are not added to output, so when make clean is called, those files are not cleaned. {code} add_custom_command( OUTPUT ${PB_C_FILE} ${PB_H_FILE} ${CALL_C_FILE} ${CALL_H_FILE} {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10631) Native Hadoop Client: Add missing output in GenerateProtobufs.cmake
[ https://issues.apache.org/jira/browse/HADOOP-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HADOOP-10631: --- Status: Patch Available (was: Open) Native Hadoop Client: Add missing output in GenerateProtobufs.cmake --- Key: HADOOP-10631 URL: https://issues.apache.org/jira/browse/HADOOP-10631 Project: Hadoop Common Issue Type: Sub-task Reporter: Binglin Chang Assignee: Binglin Chang Priority: Trivial Attachments: HADOOP-10631.v1.patch In GenerateProtobufs.cmake, pb-c.h.s files are not added to output, so when make clean is called, those files are not cleaned. {code} add_custom_command( OUTPUT ${PB_C_FILE} ${PB_H_FILE} ${CALL_C_FILE} ${CALL_H_FILE} {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10631) Native Hadoop Client: Add missing output in GenerateProtobufs.cmake
[ https://issues.apache.org/jira/browse/HADOOP-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HADOOP-10631: --- Target Version/s: HADOOP-10388 Native Hadoop Client: Add missing output in GenerateProtobufs.cmake --- Key: HADOOP-10631 URL: https://issues.apache.org/jira/browse/HADOOP-10631 Project: Hadoop Common Issue Type: Sub-task Affects Versions: HADOOP-10388 Reporter: Binglin Chang Assignee: Binglin Chang Priority: Trivial Attachments: HADOOP-10631.v1.patch In GenerateProtobufs.cmake, pb-c.h.s files are not added to output, so when make clean is called, those files are not cleaned. {code} add_custom_command( OUTPUT ${PB_C_FILE} ${PB_H_FILE} ${CALL_C_FILE} ${CALL_H_FILE} {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10631) Native Hadoop Client: Add missing output in GenerateProtobufs.cmake
[ https://issues.apache.org/jira/browse/HADOOP-10631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HADOOP-10631: --- Attachment: HADOOP-10631.v1.patch Native Hadoop Client: Add missing output in GenerateProtobufs.cmake --- Key: HADOOP-10631 URL: https://issues.apache.org/jira/browse/HADOOP-10631 Project: Hadoop Common Issue Type: Sub-task Affects Versions: HADOOP-10388 Reporter: Binglin Chang Assignee: Binglin Chang Priority: Trivial Attachments: HADOOP-10631.v1.patch In GenerateProtobufs.cmake, pb-c.h.s files are not added to output, so when make clean is called, those files are not cleaned. {code} add_custom_command( OUTPUT ${PB_C_FILE} ${PB_H_FILE} ${CALL_C_FILE} ${CALL_H_FILE} {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10564) Add username to native RPCv9 client
[ https://issues.apache.org/jira/browse/HADOOP-10564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HADOOP-10564: --- Attachment: HADOOP-10564-pnative.006.patch patch lgtm +1 update the patch one line to match latest branch HEAD. Add username to native RPCv9 client --- Key: HADOOP-10564 URL: https://issues.apache.org/jira/browse/HADOOP-10564 Project: Hadoop Common Issue Type: Sub-task Components: native Affects Versions: HADOOP-10388 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: HADOOP-10388 Attachments: HADOOP-10564-pnative.002.patch, HADOOP-10564-pnative.003.patch, HADOOP-10564-pnative.004.patch, HADOOP-10564-pnative.005.patch, HADOOP-10564-pnative.006.patch, HADOOP-10564.001.patch Add the ability for the native RPCv9 client to set a username when initiating a connection. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10564) Add username to native RPCv9 client
[ https://issues.apache.org/jira/browse/HADOOP-10564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13992599#comment-13992599 ] Binglin Chang commented on HADOOP-10564: Hi Colin, about user.h, we may need a struct to represent user(like ugi in hadoop), so in the future more things can be added to it, like auth method, tokens, something like: struct hadoop_user; hadoop_user_(alloc|get_login|free) It is better to add it when the change is small, thoughts? Add username to native RPCv9 client --- Key: HADOOP-10564 URL: https://issues.apache.org/jira/browse/HADOOP-10564 Project: Hadoop Common Issue Type: Sub-task Components: native Affects Versions: HADOOP-10388 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HADOOP-10564-pnative.002.patch, HADOOP-10564.001.patch Add the ability for the native RPCv9 client to set a username when initiating a connection. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HADOOP-10577) Fix some minors error and compile on macosx
[ https://issues.apache.org/jira/browse/HADOOP-10577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang resolved HADOOP-10577. Resolution: Fixed Fix some minors error and compile on macosx --- Key: HADOOP-10577 URL: https://issues.apache.org/jira/browse/HADOOP-10577 Project: Hadoop Common Issue Type: Sub-task Reporter: Binglin Chang Assignee: Binglin Chang Priority: Minor Attachments: HADOOP-10577.v1.patch, HADOOP-10577.v2.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10389) Native RPCv9 client
[ https://issues.apache.org/jira/browse/HADOOP-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998406#comment-13998406 ] Binglin Chang commented on HADOOP-10389: I think TestRPC.testSlowRpc should only use one socket based on socket reuse logic, so the responses also go through one socket. The test basically do the following: client call (id=0) - server client call (id=1) - server server response(id = 1) - client client call (id=2) - server server response(id = 2) - client server response(id = 0) - client So client should recognize callid in response handling logic, otherwise the response is mismatched(it will return response 1 for call 0) bq. But last time I investigated it, each TCP socket could only do one request at once. Do you mean the current native code? or java code? Native RPCv9 client --- Key: HADOOP-10389 URL: https://issues.apache.org/jira/browse/HADOOP-10389 Project: Hadoop Common Issue Type: Sub-task Affects Versions: HADOOP-10388 Reporter: Binglin Chang Assignee: Colin Patrick McCabe Attachments: HADOOP-10388.001.patch, HADOOP-10389.002.patch, HADOOP-10389.004.patch, HADOOP-10389.005.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10389) Native RPCv9 client
[ https://issues.apache.org/jira/browse/HADOOP-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997356#comment-13997356 ] Binglin Chang commented on HADOOP-10389: bq. I think the performance is actually going to be pretty good I am not worried about performance, it just may cause more redundant code, wait to see some code then:) Speak of redundant code, there are too many redundant codes in xxx.call.c, is it possible to do this using functions rather than generating repeat code? bq. The rationale behind call id in general is that in some future version of the Java RPC system, we may want to allow multiple calls to be in flight at once I looked closely to the rpc code, looks like concurrent rpc is supported, unit test in TestRPC.testSlowRpc test this. Native RPCv9 client --- Key: HADOOP-10389 URL: https://issues.apache.org/jira/browse/HADOOP-10389 Project: Hadoop Common Issue Type: Sub-task Affects Versions: HADOOP-10388 Reporter: Binglin Chang Assignee: Colin Patrick McCabe Attachments: HADOOP-10388.001.patch, HADOOP-10389.002.patch, HADOOP-10389.004.patch, HADOOP-10389.005.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10389) Native RPCv9 client
[ https://issues.apache.org/jira/browse/HADOOP-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998455#comment-13998455 ] Binglin Chang commented on HADOOP-10389: bq. The point of the generated functions is to provide type safety, so you can't pass the wrong request and response types to the functions. It also makes remote procedure calls look like a local function call, which is one of the main ideas in RPC. We can keep functions, but the repeated code in these functions can be eliminated using abstraction, so as to reduce the binary code size. Native RPCv9 client --- Key: HADOOP-10389 URL: https://issues.apache.org/jira/browse/HADOOP-10389 Project: Hadoop Common Issue Type: Sub-task Affects Versions: HADOOP-10388 Reporter: Binglin Chang Assignee: Colin Patrick McCabe Attachments: HADOOP-10388.001.patch, HADOOP-10389.002.patch, HADOOP-10389.004.patch, HADOOP-10389.005.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10577) Fix some minors error and compile on macosx
[ https://issues.apache.org/jira/browse/HADOOP-10577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HADOOP-10577: --- Affects Version/s: HADOOP-10388 Fix some minors error and compile on macosx --- Key: HADOOP-10577 URL: https://issues.apache.org/jira/browse/HADOOP-10577 Project: Hadoop Common Issue Type: Sub-task Affects Versions: HADOOP-10388 Reporter: Binglin Chang Assignee: Binglin Chang Priority: Minor Attachments: HADOOP-10577.v1.patch, HADOOP-10577.v2.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10389) Native RPCv9 client
[ https://issues.apache.org/jira/browse/HADOOP-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13992528#comment-13992528 ] Binglin Chang commented on HADOOP-10389: bq. Hmm. That's odd. No matter how many warning and pedantic options I pass to gcc http://stackoverflow.com/questions/11869593/c99-printf-formatters-vs-c11-user-defined-literals although it is c code, it could be used in c++ Native RPCv9 client --- Key: HADOOP-10389 URL: https://issues.apache.org/jira/browse/HADOOP-10389 Project: Hadoop Common Issue Type: Sub-task Affects Versions: HADOOP-10388 Reporter: Binglin Chang Assignee: Colin Patrick McCabe Attachments: HADOOP-10388.001.patch, HADOOP-10389.002.patch, HADOOP-10389.004.patch, HADOOP-10389.005.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10389) Native RPCv9 client
[ https://issues.apache.org/jira/browse/HADOOP-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13996017#comment-13996017 ] Binglin Chang commented on HADOOP-10389: bq. The rationale behind call id in general is that in some future version of the Java RPC system, we may want to allow multiple calls to be in flight at once I guess I always thought this is already implemented, because client already can make parallel calls, and there are multiple rpc handler threads in server side already, doing this should be natural and easy, although I haven't test about this. Are you sure about this? If so I can try to add this in java... bq. From the library user's perspective, they are calling hdfsOpen, hdfsClose, etc. etc. So those method all need to initialize hrpc_proxy again(which need server address, user and other configs), what I try to say is maybe proxy and call can be separated, proxy can be shared, call on stack for each call. Maybe it's to late to change that, just my two cents. bq. You just can't de-allocate the proxy while it is in use. So there should be a method for user to cancel an ongoing rpc(also need to make sure after cancel complete, no more memory access to hrpc_proxy and call), looks like hrpc_proxy_deactivate can't do this yet? Native RPCv9 client --- Key: HADOOP-10389 URL: https://issues.apache.org/jira/browse/HADOOP-10389 Project: Hadoop Common Issue Type: Sub-task Affects Versions: HADOOP-10388 Reporter: Binglin Chang Assignee: Colin Patrick McCabe Attachments: HADOOP-10388.001.patch, HADOOP-10389.002.patch, HADOOP-10389.004.patch, HADOOP-10389.005.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10564) Add username to native RPCv9 client
[ https://issues.apache.org/jira/browse/HADOOP-10564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13994932#comment-13994932 ] Binglin Chang commented on HADOOP-10564: Hi, Colin, thanks for the patch, some comments: 1. it's hard to remember which field needs free(some are stack alloc, some are heap) which didn't, could you add comments of each field's memory ownership? 2. in patch line: 690 {code} 690 +proxy-call.remote = *remote; 691 +proxy-call.remote = *remote; {code} 3. reactor.c 71 RB_NFIND may always find nothing, based on what the RB tree compare method's content(only pointer equal means equal). I am not familiar with RB_tree's semantic and the header file doesn't provide any document. And hrpc_conn_usable may be redundant cause RB_NFIND already checks those fields. Add username to native RPCv9 client --- Key: HADOOP-10564 URL: https://issues.apache.org/jira/browse/HADOOP-10564 Project: Hadoop Common Issue Type: Sub-task Components: native Affects Versions: HADOOP-10388 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HADOOP-10564-pnative.002.patch, HADOOP-10564-pnative.003.patch, HADOOP-10564.001.patch Add the ability for the native RPCv9 client to set a username when initiating a connection. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10389) Native RPCv9 client
[ https://issues.apache.org/jira/browse/HADOOP-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13994974#comment-13994974 ] Binglin Chang commented on HADOOP-10389: Hi Colin, I have some difficulty understanding your code and some comments, could you please help explain? 1. to my understanding, rpc client should have a mapcallid, call to record all unfinished calls, but I could not find any code assigning callids(only make them 0) and manage unfinished calls, could you help me located those logic? 2. in the demo namenode-rpc-unit, I see each proxy only have one call(the current call), does this mean client can only call one rpc at the same time? If so probably every rpc call will need it's own rpc_proxy, from users standing point, they may want what java's interface, multi-thread can concurrently call one proxy, this is very common in hdfs client. 3. hrpc_proxy.call belongs to hrpc_proxy, but in hrpc_proxy_start, the call is passed to reactor-inbox.pending_calls, which may have longer life circle than hrpc_proxy, so there may be protential bug in hrpc_proxy.call? Native RPCv9 client --- Key: HADOOP-10389 URL: https://issues.apache.org/jira/browse/HADOOP-10389 Project: Hadoop Common Issue Type: Sub-task Affects Versions: HADOOP-10388 Reporter: Binglin Chang Assignee: Colin Patrick McCabe Attachments: HADOOP-10388.001.patch, HADOOP-10389.002.patch, HADOOP-10389.004.patch, HADOOP-10389.005.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10389) Native RPCv9 client
[ https://issues.apache.org/jira/browse/HADOOP-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13996075#comment-13996075 ] Binglin Chang commented on HADOOP-10389: bq. So there should be a method for user to cancel an ongoing rpc I thought more about this, adding timeout to call also works and seems like a better solution. Native RPCv9 client --- Key: HADOOP-10389 URL: https://issues.apache.org/jira/browse/HADOOP-10389 Project: Hadoop Common Issue Type: Sub-task Affects Versions: HADOOP-10388 Reporter: Binglin Chang Assignee: Colin Patrick McCabe Attachments: HADOOP-10388.001.patch, HADOOP-10389.002.patch, HADOOP-10389.004.patch, HADOOP-10389.005.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10577) Fix some minors error and compile on macosx
[ https://issues.apache.org/jira/browse/HADOOP-10577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13992519#comment-13992519 ] Binglin Chang commented on HADOOP-10577: Thanks for the review Luke! I have committed this. Fix some minors error and compile on macosx --- Key: HADOOP-10577 URL: https://issues.apache.org/jira/browse/HADOOP-10577 Project: Hadoop Common Issue Type: Sub-task Reporter: Binglin Chang Assignee: Binglin Chang Priority: Minor Attachments: HADOOP-10577.v1.patch, HADOOP-10577.v2.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10389) Native RPCv9 client
[ https://issues.apache.org/jira/browse/HADOOP-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13991599#comment-13991599 ] Binglin Chang commented on HADOOP-10389: Hi Yongjun, thanks for trying the patch, the issue you mentioned is right, Actually there are more issues I want discuss, Colin: 1. string and macro need to be separated with space, e.g. % PRId64 xxx, some compiler are more strict about this 2. I think it is pretty safe to use %d instead of % PRId32, it seems unnecessary do more typing Native RPCv9 client --- Key: HADOOP-10389 URL: https://issues.apache.org/jira/browse/HADOOP-10389 Project: Hadoop Common Issue Type: Sub-task Affects Versions: HADOOP-10388 Reporter: Binglin Chang Assignee: Colin Patrick McCabe Attachments: HADOOP-10388.001.patch, HADOOP-10389.002.patch, HADOOP-10389.004.patch, HADOOP-10389.005.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HADOOP-10577) Fix some minors error and compile on macosx
Binglin Chang created HADOOP-10577: -- Summary: Fix some minors error and compile on macosx Key: HADOOP-10577 URL: https://issues.apache.org/jira/browse/HADOOP-10577 Project: Hadoop Common Issue Type: Sub-task Reporter: Binglin Chang Assignee: Binglin Chang Priority: Minor -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10577) Fix some minors error and compile on macosx
[ https://issues.apache.org/jira/browse/HADOOP-10577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HADOOP-10577: --- Attachment: HADOOP-10577.v1.patch Changes: 1. find_library should not use .so suffix for cross platform compatibility 2. clang/libc++ does not have tr1/memory, just memory 3. wrong printf usage %Zd, should be %zu? code on current branch can now compile at macosx on my laptop with the change. Fix some minors error and compile on macosx --- Key: HADOOP-10577 URL: https://issues.apache.org/jira/browse/HADOOP-10577 Project: Hadoop Common Issue Type: Sub-task Reporter: Binglin Chang Assignee: Binglin Chang Priority: Minor Attachments: HADOOP-10577.v1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10577) Fix some minors error and compile on macosx
[ https://issues.apache.org/jira/browse/HADOOP-10577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HADOOP-10577: --- Attachment: HADOOP-10577.v2.patch update a little, found wrong usage of sem_post/sem_wait Fix some minors error and compile on macosx --- Key: HADOOP-10577 URL: https://issues.apache.org/jira/browse/HADOOP-10577 Project: Hadoop Common Issue Type: Sub-task Reporter: Binglin Chang Assignee: Binglin Chang Priority: Minor Attachments: HADOOP-10577.v1.patch, HADOOP-10577.v2.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HADOOP-10497) Doc NodeGroup-aware(HADOOP Virtualization Extensisons)
[ https://issues.apache.org/jira/browse/HADOOP-10497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang reassigned HADOOP-10497: -- Assignee: Binglin Chang Doc NodeGroup-aware(HADOOP Virtualization Extensisons) -- Key: HADOOP-10497 URL: https://issues.apache.org/jira/browse/HADOOP-10497 Project: Hadoop Common Issue Type: Task Components: documentation Reporter: wenwupeng Assignee: Binglin Chang Labels: documentation Fix For: site Most of patches from Umbrella JIRA HADOOP-8468 have committed, However there is no site to introduce NodeGroup-aware(HADOOP Virtualization Extensisons) and how to do configuration. so we need to doc it. 1. Doc NodeGroup-aware relate in http://hadoop.apache.org/docs/current 2. Doc NodeGroup-aware properties in core-default.xml. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10497) Add document for node group related configs
[ https://issues.apache.org/jira/browse/HADOOP-10497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HADOOP-10497: --- Summary: Add document for node group related configs (was: Doc NodeGroup-aware(HADOOP Virtualization Extensisons)) Add document for node group related configs --- Key: HADOOP-10497 URL: https://issues.apache.org/jira/browse/HADOOP-10497 Project: Hadoop Common Issue Type: Task Components: documentation Reporter: wenwupeng Assignee: Binglin Chang Labels: documentation Fix For: site Most of patches from Umbrella JIRA HADOOP-8468 have committed, However there is no site to introduce NodeGroup-aware(HADOOP Virtualization Extensisons) and how to do configuration. so we need to doc it. 1. Doc NodeGroup-aware relate in http://hadoop.apache.org/docs/current 2. Doc NodeGroup-aware properties in core-default.xml. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10388) Pure native hadoop client
[ https://issues.apache.org/jira/browse/HADOOP-10388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13956157#comment-13956157 ] Binglin Chang commented on HADOOP-10388: bq. We can even make the XML-reading code optional if you want. Sure, if for compatibility I guess add xml support if fine. By keeping strict compatibility we may need to support all javax xml / hadoop config features, I'm afraid libexpact/libxml2 support all of those, a lot effort may be spent on this, it is better to make it optional and do it later I think. bq. Thread pools and async I/O, I'm afraid, are something we can't live without. I am also prefer to use async I/O and thread for performance reasons, the code I published on github already have a working HDFS client with read/write, and HDFSOuputstream uses an aditional thread. What I was saying is use of more threads should be limited, in java client, to simply read/write a HDFS file, too much threads are used(rpc socket read/write, data transfer socket read/write, other misc executors, lease renewer etc.) Since we use async i/o, thread number should be rapidly reduced Pure native hadoop client - Key: HADOOP-10388 URL: https://issues.apache.org/jira/browse/HADOOP-10388 Project: Hadoop Common Issue Type: New Feature Affects Versions: HADOOP-10388 Reporter: Binglin Chang Assignee: Colin Patrick McCabe A pure native hadoop client has following use case/advantages: 1. writing Yarn applications using c++ 2. direct access to HDFS, without extra proxy overhead, comparing to web/nfs interface. 3. wrap native library to support more languages, e.g. python 4. lightweight, small footprint compare to several hundred MB of JDK and hadoop library with various dependencies. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10388) Pure native hadoop client
[ https://issues.apache.org/jira/browse/HADOOP-10388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13949062#comment-13949062 ] Binglin Chang commented on HADOOP-10388: Thanks for posting this Colin, looking into the code right now. [~wenwu] and I both got branch committer invitation today. His is interest in providing more test for the feature. About the code and created sub-jiras, here are some initial questions: # What will the project structure looks like? A separate top-level hadoop-native-client-project? Or seperate code files in common/hdfs/yarn existing dirs? # Why the name libhdfs-core.so and libyarn-core.so? it's a client library, doesn't sounds like core. # I'm surprised the code turn to pure c, it seems because of this, we are introducing strange libraries and tools(protobuf-c(last release in 2011) and the tool shorten), about test library, cpp library gtest is not going to be used too? In short, what libraries are planned to be used? # I like the library to be lightweight, some people just want a header file and a static linked library(a few MB in size), to be able to read/write from hdfs, so some heavy feature: xml library(config file parsing), uri parsing(cross FileSystem symlink), thread pool better be optional, not required. Pure native hadoop client - Key: HADOOP-10388 URL: https://issues.apache.org/jira/browse/HADOOP-10388 Project: Hadoop Common Issue Type: New Feature Affects Versions: HADOOP-10388 Reporter: Binglin Chang Assignee: Colin Patrick McCabe A pure native hadoop client has following use case/advantages: 1. writing Yarn applications using c++ 2. direct access to HDFS, without extra proxy overhead, comparing to web/nfs interface. 3. wrap native library to support more languages, e.g. python 4. lightweight, small footprint compare to several hundred MB of JDK and hadoop library with various dependencies. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10388) Pure native hadoop client
[ https://issues.apache.org/jira/browse/HADOOP-10388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1394#comment-1394 ] Binglin Chang commented on HADOOP-10388: Hi Colin, what's the status of the work now? Could you post some of the work so other's can cooperate? Pure native hadoop client - Key: HADOOP-10388 URL: https://issues.apache.org/jira/browse/HADOOP-10388 Project: Hadoop Common Issue Type: New Feature Reporter: Binglin Chang Assignee: Colin Patrick McCabe A pure native hadoop client has following use case/advantages: 1. writing Yarn applications using c++ 2. direct access to HDFS, without extra proxy overhead, comparing to web/nfs interface. 3. wrap native library to support more languages, e.g. python 4. lightweight, small footprint compare to several hundred MB of JDK and hadoop library with various dependencies. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HADOOP-10411) TestCacheDirectives.testExceedsCapacity fails occasionally
Binglin Chang created HADOOP-10411: -- Summary: TestCacheDirectives.testExceedsCapacity fails occasionally Key: HADOOP-10411 URL: https://issues.apache.org/jira/browse/HADOOP-10411 Project: Hadoop Common Issue Type: Bug Reporter: Binglin Chang Priority: Minor See this [link|https://issues.apache.org/jira/browse/HADOOP-10390?focusedCommentId=13932236page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13932236], error message: {code} Namenode should not send extra CACHE commands expected:0 but was:2 Stacktrace java.lang.AssertionError: Namenode should not send extra CACHE commands expected:0 but was:2 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testExceedsCapacity(TestCacheDirectives.java:1413) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10390) DFSCIOTest looks for the wrong version of libhdfs
[ https://issues.apache.org/jira/browse/HADOOP-10390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13938904#comment-13938904 ] Binglin Chang commented on HADOOP-10390: After apply the patch, libhdfs.so.0.0.0, test_libhdfs_read/write are correctly copied to dest location, but the DFSCIOTest job failed because the task failed due to libhdfs internal error, maybe it is my environment's problem, have not found the root cause yet, unlikely it is caused by the issue here. DFSCIOTest looks for the wrong version of libhdfs - Key: HADOOP-10390 URL: https://issues.apache.org/jira/browse/HADOOP-10390 Project: Hadoop Common Issue Type: Bug Components: test Affects Versions: 2.0.0-alpha Reporter: wenwupeng Assignee: Binglin Chang Attachments: HADOOP-10390.v1.patch, HADOOP-10390.v2.patch, HADOOP-10390.v3.patch Run benckmark DFSCIOTest failed at libhdfs.so.1 hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.3.0-tests.jar DFSCIOTest -write -nrFiles 1 -fileSize 100 DFSCIOTest.0.0.1 14/03/06 02:52:55 INFO fs.DFSCIOTest: nrFiles = 1 14/03/06 02:52:55 INFO fs.DFSCIOTest: fileSize (MB) = 100 14/03/06 02:52:55 INFO fs.DFSCIOTest: bufferSize = 100 14/03/06 02:52:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable File /hadoop/hadoop-smoke/libhdfs/libhdfs.so.1 does not exist can get libhdfs.so.0.0.0 under ./lib/native [root@namenode hadoop-smoke]# find ./ -name libhdfs* ./lib/native/libhdfs.so ./lib/native/libhdfs.so.0.0.0 ./lib/native/libhdfs.a -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10390) DFSCIOTest looks for the wrong version of libhdfs
[ https://issues.apache.org/jira/browse/HADOOP-10390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HADOOP-10390: --- Attachment: HADOOP-10390.v3.patch update patch, change test_libhdfs_read/write to HADOOP_HOME/bin DFSCIOTest looks for the wrong version of libhdfs - Key: HADOOP-10390 URL: https://issues.apache.org/jira/browse/HADOOP-10390 Project: Hadoop Common Issue Type: Bug Components: test Affects Versions: 2.0.0-alpha Reporter: wenwupeng Assignee: Binglin Chang Attachments: HADOOP-10390.v1.patch, HADOOP-10390.v2.patch, HADOOP-10390.v3.patch Run benckmark DFSCIOTest failed at libhdfs.so.1 hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.3.0-tests.jar DFSCIOTest -write -nrFiles 1 -fileSize 100 DFSCIOTest.0.0.1 14/03/06 02:52:55 INFO fs.DFSCIOTest: nrFiles = 1 14/03/06 02:52:55 INFO fs.DFSCIOTest: fileSize (MB) = 100 14/03/06 02:52:55 INFO fs.DFSCIOTest: bufferSize = 100 14/03/06 02:52:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable File /hadoop/hadoop-smoke/libhdfs/libhdfs.so.1 does not exist can get libhdfs.so.0.0.0 under ./lib/native [root@namenode hadoop-smoke]# find ./ -name libhdfs* ./lib/native/libhdfs.so ./lib/native/libhdfs.so.0.0.0 ./lib/native/libhdfs.a -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10388) Pure native hadoop client
[ https://issues.apache.org/jira/browse/HADOOP-10388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13931544#comment-13931544 ] Binglin Chang commented on HADOOP-10388: Hi Colin, I see you assign all the jira to yourself now. Thanks for taking this effort. I create this jira mainly because I want to help some of the development here. Do you have plan/idea on how to proceed the work? Pure native hadoop client - Key: HADOOP-10388 URL: https://issues.apache.org/jira/browse/HADOOP-10388 Project: Hadoop Common Issue Type: New Feature Reporter: Binglin Chang Assignee: Colin Patrick McCabe A pure native hadoop client has following use case/advantages: 1. writing Yarn applications using c++ 2. direct access to HDFS, without extra proxy overhead, comparing to web/nfs interface. 3. wrap native library to support more languages, e.g. python 4. lightweight, small footprint compare to several hundred MB of JDK and hadoop library with various dependencies. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10390) DFSCIOTest looks for the wrong version of libhdfs
[ https://issues.apache.org/jira/browse/HADOOP-10390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HADOOP-10390: --- Attachment: HADOOP-10390.v2.patch Attach new version of the patch addressing all 3 issues, I put test_libhdfs_read/write to HADOOP_HOME/lib/native as suggested. DFSCIOTest looks for the wrong version of libhdfs - Key: HADOOP-10390 URL: https://issues.apache.org/jira/browse/HADOOP-10390 Project: Hadoop Common Issue Type: Bug Components: test Affects Versions: 2.0.0-alpha Reporter: wenwupeng Assignee: Binglin Chang Attachments: HADOOP-10390.v1.patch, HADOOP-10390.v2.patch Run benckmark DFSCIOTest failed at libhdfs.so.1 hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.3.0-tests.jar DFSCIOTest -write -nrFiles 1 -fileSize 100 DFSCIOTest.0.0.1 14/03/06 02:52:55 INFO fs.DFSCIOTest: nrFiles = 1 14/03/06 02:52:55 INFO fs.DFSCIOTest: fileSize (MB) = 100 14/03/06 02:52:55 INFO fs.DFSCIOTest: bufferSize = 100 14/03/06 02:52:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable File /hadoop/hadoop-smoke/libhdfs/libhdfs.so.1 does not exist can get libhdfs.so.0.0.0 under ./lib/native [root@namenode hadoop-smoke]# find ./ -name libhdfs* ./lib/native/libhdfs.so ./lib/native/libhdfs.so.0.0.0 ./lib/native/libhdfs.a -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10390) DFSCIOTest looks for the wrong version of libhdfs
[ https://issues.apache.org/jira/browse/HADOOP-10390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13932079#comment-13932079 ] Binglin Chang commented on HADOOP-10390: Which position would you suggest? Do you mean we do not put those into package distribution? DFSCIOTest looks for the wrong version of libhdfs - Key: HADOOP-10390 URL: https://issues.apache.org/jira/browse/HADOOP-10390 Project: Hadoop Common Issue Type: Bug Components: test Affects Versions: 2.0.0-alpha Reporter: wenwupeng Assignee: Binglin Chang Attachments: HADOOP-10390.v1.patch, HADOOP-10390.v2.patch Run benckmark DFSCIOTest failed at libhdfs.so.1 hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.3.0-tests.jar DFSCIOTest -write -nrFiles 1 -fileSize 100 DFSCIOTest.0.0.1 14/03/06 02:52:55 INFO fs.DFSCIOTest: nrFiles = 1 14/03/06 02:52:55 INFO fs.DFSCIOTest: fileSize (MB) = 100 14/03/06 02:52:55 INFO fs.DFSCIOTest: bufferSize = 100 14/03/06 02:52:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable File /hadoop/hadoop-smoke/libhdfs/libhdfs.so.1 does not exist can get libhdfs.so.0.0.0 under ./lib/native [root@namenode hadoop-smoke]# find ./ -name libhdfs* ./lib/native/libhdfs.so ./lib/native/libhdfs.so.0.0.0 ./lib/native/libhdfs.a -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10390) DFSCIOTest looks for the wrong version of libhdfs
[ https://issues.apache.org/jira/browse/HADOOP-10390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930117#comment-13930117 ] Binglin Chang commented on HADOOP-10390: bq. hdfs_read/hdfs_write no longer exists(at least not in hadoop distribution), but it is used for DFSCIOTest, should they also go into distribution? I was trying to update the patch to solve all the issues in the jira, but have not found a proper place to put test_libhdfs_read/test_libhdfs_write, HADOOP_HOME/bin does not feel like a right place for test programs. DFSCIOTest looks for the wrong version of libhdfs - Key: HADOOP-10390 URL: https://issues.apache.org/jira/browse/HADOOP-10390 Project: Hadoop Common Issue Type: Bug Components: test Affects Versions: 2.0.0-alpha Reporter: wenwupeng Assignee: Binglin Chang Attachments: HADOOP-10390.v1.patch Run benckmark DFSCIOTest failed at libhdfs.so.1 hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.3.0-tests.jar DFSCIOTest -write -nrFiles 1 -fileSize 100 DFSCIOTest.0.0.1 14/03/06 02:52:55 INFO fs.DFSCIOTest: nrFiles = 1 14/03/06 02:52:55 INFO fs.DFSCIOTest: fileSize (MB) = 100 14/03/06 02:52:55 INFO fs.DFSCIOTest: bufferSize = 100 14/03/06 02:52:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable File /hadoop/hadoop-smoke/libhdfs/libhdfs.so.1 does not exist can get libhdfs.so.0.0.0 under ./lib/native [root@namenode hadoop-smoke]# find ./ -name libhdfs* ./lib/native/libhdfs.so ./lib/native/libhdfs.so.0.0.0 ./lib/native/libhdfs.a -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10390) DFSCIOTest looks for the wrong version of libhdfs
[ https://issues.apache.org/jira/browse/HADOOP-10390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925488#comment-13925488 ] Binglin Chang commented on HADOOP-10390: Looks like there is no libhdfs.so.0: {code} [root@namenode native]# ll total 2184 -rw-r--r-- 1 67974 users 621326 Feb 11 08:55 libhadoop.a -rw-r--r-- 1 67974 users 534024 Feb 11 08:55 libhadooppipes.a lrwxrwxrwx 1 67974 users 18 Feb 17 22:37 libhadoop.so - libhadoop.so.1.0.0 -rwxr-xr-x 1 67974 users 446741 Feb 11 08:55 libhadoop.so.1.0.0 -rw-r--r-- 1 67974 users 226360 Feb 11 08:55 libhadooputils.a -rw-r--r-- 1 67974 users 204586 Feb 11 08:55 libhdfs.a lrwxrwxrwx 1 67974 users 16 Feb 17 22:37 libhdfs.so - libhdfs.so.0.0.0 -rwxr-xr-x 1 67974 users 167760 Feb 11 08:55 libhdfs.so.0.0.0 {code} DFSCIOTest looks for the wrong version of libhdfs - Key: HADOOP-10390 URL: https://issues.apache.org/jira/browse/HADOOP-10390 Project: Hadoop Common Issue Type: Bug Components: test Affects Versions: 2.0.0-alpha Reporter: wenwupeng Assignee: Binglin Chang Attachments: HADOOP-10390.v1.patch Run benckmark DFSCIOTest failed at libhdfs.so.1 hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.3.0-tests.jar DFSCIOTest -write -nrFiles 1 -fileSize 100 DFSCIOTest.0.0.1 14/03/06 02:52:55 INFO fs.DFSCIOTest: nrFiles = 1 14/03/06 02:52:55 INFO fs.DFSCIOTest: fileSize (MB) = 100 14/03/06 02:52:55 INFO fs.DFSCIOTest: bufferSize = 100 14/03/06 02:52:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable File /hadoop/hadoop-smoke/libhdfs/libhdfs.so.1 does not exist can get libhdfs.so.0.0.0 under ./lib/native [root@namenode hadoop-smoke]# find ./ -name libhdfs* ./lib/native/libhdfs.so ./lib/native/libhdfs.so.0.0.0 ./lib/native/libhdfs.a -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10390) DFSCIOTest looks for the wrong version of libhdfs
[ https://issues.apache.org/jira/browse/HADOOP-10390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925499#comment-13925499 ] Binglin Chang commented on HADOOP-10390: I reviewed code carefully, related code: {code} fs.copyFromLocalFile(new Path(hadoopHome + /libhdfs/libhdfs.so. + HDFS_LIB_VERSION), HDFS_SHLIB); fs.copyFromLocalFile(new Path(hadoopHome + /libhdfs/hdfs_read), HDFS_READ); fs.copyFromLocalFile(new Path(hadoopHome + /libhdfs/hdfs_write), HDFS_WRITE); {code} the program try to copy hadoopHome/libhdfs/libhdfs.so to HDFS and failed, because file doesn't exists. Actually the whole path is not right, I suspect the path(HADOOP_HOME/libhdfs/libhdfs.so|hdfs_read|hdfs_write) is already out-of-date now(maybe they exist in hadoop-v1 test environments), so in theory the test should fail for a long time ago. DFSCIOTest looks for the wrong version of libhdfs - Key: HADOOP-10390 URL: https://issues.apache.org/jira/browse/HADOOP-10390 Project: Hadoop Common Issue Type: Bug Components: test Affects Versions: 2.0.0-alpha Reporter: wenwupeng Assignee: Binglin Chang Attachments: HADOOP-10390.v1.patch Run benckmark DFSCIOTest failed at libhdfs.so.1 hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.3.0-tests.jar DFSCIOTest -write -nrFiles 1 -fileSize 100 DFSCIOTest.0.0.1 14/03/06 02:52:55 INFO fs.DFSCIOTest: nrFiles = 1 14/03/06 02:52:55 INFO fs.DFSCIOTest: fileSize (MB) = 100 14/03/06 02:52:55 INFO fs.DFSCIOTest: bufferSize = 100 14/03/06 02:52:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable File /hadoop/hadoop-smoke/libhdfs/libhdfs.so.1 does not exist can get libhdfs.so.0.0.0 under ./lib/native [root@namenode hadoop-smoke]# find ./ -name libhdfs* ./lib/native/libhdfs.so ./lib/native/libhdfs.so.0.0.0 ./lib/native/libhdfs.a -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10390) DFSCIOTest looks for the wrong version of libhdfs
[ https://issues.apache.org/jira/browse/HADOOP-10390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925511#comment-13925511 ] Binglin Chang commented on HADOOP-10390: Currently I think there are 3 issues we need to fix: 1. libhdfs has a different path and version now, they should be updated 2. hdfs_read/hdfs_write no longer exists(at least not in hadoop distribution), but it is used for DFSCIOTest, should they also go into distribution? 3. DFSCIOTest on exception only prints error message: System.err.print(e.getLocalizedMessage()); it is so confusing that I took a long time to find the root cause DFSCIOTest looks for the wrong version of libhdfs - Key: HADOOP-10390 URL: https://issues.apache.org/jira/browse/HADOOP-10390 Project: Hadoop Common Issue Type: Bug Components: test Affects Versions: 2.0.0-alpha Reporter: wenwupeng Assignee: Binglin Chang Attachments: HADOOP-10390.v1.patch Run benckmark DFSCIOTest failed at libhdfs.so.1 hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.3.0-tests.jar DFSCIOTest -write -nrFiles 1 -fileSize 100 DFSCIOTest.0.0.1 14/03/06 02:52:55 INFO fs.DFSCIOTest: nrFiles = 1 14/03/06 02:52:55 INFO fs.DFSCIOTest: fileSize (MB) = 100 14/03/06 02:52:55 INFO fs.DFSCIOTest: bufferSize = 100 14/03/06 02:52:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable File /hadoop/hadoop-smoke/libhdfs/libhdfs.so.1 does not exist can get libhdfs.so.0.0.0 under ./lib/native [root@namenode hadoop-smoke]# find ./ -name libhdfs* ./lib/native/libhdfs.so ./lib/native/libhdfs.so.0.0.0 ./lib/native/libhdfs.a -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10388) Pure native hadoop client
[ https://issues.apache.org/jira/browse/HADOOP-10388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925585#comment-13925585 ] Binglin Chang commented on HADOOP-10388: About coding standard, google is mostly fine. I mention c+\+11 mostly for the new std libraries(thread, lock/condition, random, unique_ptr/shared_ptr, regex), so we can avoid writing lot of common utility code, it's fine we use boost instead, and provide typedefs so c+\+11 or boost can both be an option, old compiler can use boost instead, new compiler can avoid boost dependency. Agree with Colin, I tend to avoid using fancy language features such as lambda, template, std::function. For compatibility the code should be plain simple, especially for public api, c+\+ does not have good binary compatibility(mainly virtual method) issue, so we need to be careful. Pure native hadoop client - Key: HADOOP-10388 URL: https://issues.apache.org/jira/browse/HADOOP-10388 Project: Hadoop Common Issue Type: New Feature Reporter: Binglin Chang A pure native hadoop client has following use case/advantages: 1. writing Yarn applications using c++ 2. direct access to HDFS, without extra proxy overhead, comparing to web/nfs interface. 3. wrap native library to support more languages, e.g. python 4. lightweight, small footprint compare to several hundred MB of JDK and hadoop library with various dependencies. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10388) Pure native hadoop client
[ https://issues.apache.org/jira/browse/HADOOP-10388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925772#comment-13925772 ] Binglin Chang commented on HADOOP-10388: bq. I'd like it to build on OS/X so that mac builds catch regressions, even if isn't for production. Agree. MacOSX is more like freebsd, I do most of my coding in mac, can help make sure mac build and test. bq. I'm not up to date with C++ test frameworks Although I havent try other test frameworks, I would recommend gtest, it is small and convenient(just a .cc file can embed into test program). If we are using google c++ coding standard, protobuf, using another google framework seems natural. Pure native hadoop client - Key: HADOOP-10388 URL: https://issues.apache.org/jira/browse/HADOOP-10388 Project: Hadoop Common Issue Type: New Feature Reporter: Binglin Chang A pure native hadoop client has following use case/advantages: 1. writing Yarn applications using c++ 2. direct access to HDFS, without extra proxy overhead, comparing to web/nfs interface. 3. wrap native library to support more languages, e.g. python 4. lightweight, small footprint compare to several hundred MB of JDK and hadoop library with various dependencies. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-10388) Pure native hadoop client
[ https://issues.apache.org/jira/browse/HADOOP-10388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925187#comment-13925187 ] Binglin Chang commented on HADOOP-10388: Although c is viable, I would suggest using c++11, that will help us get rid of lot of dependencies and make the code smaller. I was writing a client just for fun, it uses c++11, depends on protobuf, json-c, sasl2, gtest and cmake, about 8k LOC. It is on github now: https://github.com/decster/libhadoopclient. Hope some of the code can be useful here. Pure native hadoop client - Key: HADOOP-10388 URL: https://issues.apache.org/jira/browse/HADOOP-10388 Project: Hadoop Common Issue Type: New Feature Reporter: Binglin Chang A pure native hadoop client has following use case/advantages: 1. writing Yarn applications using c++ 2. direct access to HDFS, without extra proxy overhead, comparing to web/nfs interface. 3. wrap native library to support more languages, e.g. python 4. lightweight, small footprint compare to several hundred MB of JDK and hadoop library with various dependencies. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10390) libhdfs.so.1 does not exist in Hadoop 2.3.0
[ https://issues.apache.org/jira/browse/HADOOP-10390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HADOOP-10390: --- Status: Patch Available (was: Open) libhdfs.so.1 does not exist in Hadoop 2.3.0 --- Key: HADOOP-10390 URL: https://issues.apache.org/jira/browse/HADOOP-10390 Project: Hadoop Common Issue Type: Bug Components: tools Affects Versions: 2.3.0 Reporter: wenwupeng Assignee: Binglin Chang Run benckmark DFSCIOTest failed at libhdfs.so.1 hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.3.0-tests.jar DFSCIOTest -write -nrFiles 1 -fileSize 100 DFSCIOTest.0.0.1 14/03/06 02:52:55 INFO fs.DFSCIOTest: nrFiles = 1 14/03/06 02:52:55 INFO fs.DFSCIOTest: fileSize (MB) = 100 14/03/06 02:52:55 INFO fs.DFSCIOTest: bufferSize = 100 14/03/06 02:52:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable File /hadoop/hadoop-smoke/libhdfs/libhdfs.so.1 does not exist can get libhdfs.so.0.0.0 under ./lib/native [root@namenode hadoop-smoke]# find ./ -name libhdfs* ./lib/native/libhdfs.so ./lib/native/libhdfs.so.0.0.0 ./lib/native/libhdfs.a -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10390) libhdfs.so.1 does not exist in Hadoop 2.3.0
[ https://issues.apache.org/jira/browse/HADOOP-10390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HADOOP-10390: --- Attachment: HADOOP-10390.v1.patch Looks like the problem is caused by version miss match in libhdfs CMakefile and DFSIOTest.HDFS_LIB_VERSION. One of them needs to be changed. The patch changes DFSIOTest.HDFS_LIB_VERSION libhdfs.so.1 does not exist in Hadoop 2.3.0 --- Key: HADOOP-10390 URL: https://issues.apache.org/jira/browse/HADOOP-10390 Project: Hadoop Common Issue Type: Bug Components: tools Affects Versions: 2.3.0 Reporter: wenwupeng Assignee: Binglin Chang Attachments: HADOOP-10390.v1.patch Run benckmark DFSCIOTest failed at libhdfs.so.1 hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.3.0-tests.jar DFSCIOTest -write -nrFiles 1 -fileSize 100 DFSCIOTest.0.0.1 14/03/06 02:52:55 INFO fs.DFSCIOTest: nrFiles = 1 14/03/06 02:52:55 INFO fs.DFSCIOTest: fileSize (MB) = 100 14/03/06 02:52:55 INFO fs.DFSCIOTest: bufferSize = 100 14/03/06 02:52:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable File /hadoop/hadoop-smoke/libhdfs/libhdfs.so.1 does not exist can get libhdfs.so.0.0.0 under ./lib/native [root@namenode hadoop-smoke]# find ./ -name libhdfs* ./lib/native/libhdfs.so ./lib/native/libhdfs.so.0.0.0 ./lib/native/libhdfs.a -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HADOOP-10388) Pure native hadoop client
Binglin Chang created HADOOP-10388: -- Summary: Pure native hadoop client Key: HADOOP-10388 URL: https://issues.apache.org/jira/browse/HADOOP-10388 Project: Hadoop Common Issue Type: New Feature Reporter: Binglin Chang A pure native hadoop client has following use case/advantages: 1. writing Yarn applications using c++ 2. direct access to HDFS, without extra proxy overhead, comparing to web/nfs interface. 3. wrap native library to support more languages, e.g. python 4. lightweight, small footprint compare to several hundred MB of JDK and hadoop library with various dependencies. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HADOOP-10389) Native RPCv9 client
Binglin Chang created HADOOP-10389: -- Summary: Native RPCv9 client Key: HADOOP-10389 URL: https://issues.apache.org/jira/browse/HADOOP-10389 Project: Hadoop Common Issue Type: Sub-task Reporter: Binglin Chang -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HADOOP-10390) libhdfs.so.1 does not exist in Hadoop 2.3.0
[ https://issues.apache.org/jira/browse/HADOOP-10390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang reassigned HADOOP-10390: -- Assignee: Binglin Chang libhdfs.so.1 does not exist in Hadoop 2.3.0 --- Key: HADOOP-10390 URL: https://issues.apache.org/jira/browse/HADOOP-10390 Project: Hadoop Common Issue Type: Bug Components: tools Affects Versions: 2.3.0 Reporter: wenwupeng Assignee: Binglin Chang Run benckmark DFSCIOTest failed at libhdfs.so.1 hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.3.0-tests.jar DFSCIOTest -write -nrFiles 1 -fileSize 100 DFSCIOTest.0.0.1 14/03/06 02:52:55 INFO fs.DFSCIOTest: nrFiles = 1 14/03/06 02:52:55 INFO fs.DFSCIOTest: fileSize (MB) = 100 14/03/06 02:52:55 INFO fs.DFSCIOTest: bufferSize = 100 14/03/06 02:52:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable File /hadoop/hadoop-smoke/libhdfs/libhdfs.so.1 can get libhdfs.so.0.0.0 under ./lib/native [root@namenode hadoop-smoke]# find ./ -name libhdfs* ./lib/native/libhdfs.so ./lib/native/libhdfs.so.0.0.0 ./lib/native/libhdfs.a -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-9648) Fix build native library on mac osx
[ https://issues.apache.org/jira/browse/HADOOP-9648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HADOOP-9648: -- Status: Patch Available (was: Open) Fix build native library on mac osx --- Key: HADOOP-9648 URL: https://issues.apache.org/jira/browse/HADOOP-9648 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.0.5-alpha, 1.1.2, 1.2.0, 1.0.4 Reporter: Kirill A. Korinskiy Attachments: HADOOP-9648-native-osx.1.0.4.patch, HADOOP-9648-native-osx.1.1.2.patch, HADOOP-9648-native-osx.1.2.0.patch, HADOOP-9648-native-osx.2.0.5-alpha-rc1.patch Some patches for fixing build a hadoop native library on os x 10.7/10.8. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HADOOP-9648) Fix build native library on mac osx
[ https://issues.apache.org/jira/browse/HADOOP-9648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HADOOP-9648: -- Attachment: HADOOP-9648.v2.patch Fix some issues of the original patch and fix related test failures in test-container-executor, changes: 1. Issue: setnetgrent is deleted in original code, this is not right, we don't need if test but still need to call setnetgrent 2. Issue: mkdirs skip create dir if the path exists, but if the path is a file, it can still succeed. Fix: change the whole implementation, mkdirat, openat is not needed anymore 3. LOGIN_NAME_MAX is not present in macos, changed to use sysconf 4. fcloseall is not present in macos, changed to close opened fds(stdin, stdout, stderr) 5. macos/freebsd do not have cgroup, disable and print error message test-container-executer issues: 6. macosx do not have user bin, skip a test 7. macosx /etc/passwd is not real path(have symlink), changed to /bin/ls Now compile with native and test-container-executor can run successfully in my macbook. Fix build native library on mac osx --- Key: HADOOP-9648 URL: https://issues.apache.org/jira/browse/HADOOP-9648 Project: Hadoop Common Issue Type: Bug Affects Versions: 1.0.4, 1.2.0, 1.1.2, 2.0.5-alpha Reporter: Kirill A. Korinskiy Attachments: HADOOP-9648-native-osx.1.0.4.patch, HADOOP-9648-native-osx.1.1.2.patch, HADOOP-9648-native-osx.1.2.0.patch, HADOOP-9648-native-osx.2.0.5-alpha-rc1.patch, HADOOP-9648.v2.patch Some patches for fixing build a hadoop native library on os x 10.7/10.8. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HADOOP-10130) RawLocalFS::LocalFSFileInputStream.pread does not track FS::Statistics
[ https://issues.apache.org/jira/browse/HADOOP-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837404#comment-13837404 ] Binglin Chang commented on HADOOP-10130: Thanks for the review and commit, Colin! RawLocalFS::LocalFSFileInputStream.pread does not track FS::Statistics -- Key: HADOOP-10130 URL: https://issues.apache.org/jira/browse/HADOOP-10130 Project: Hadoop Common Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Priority: Minor Fix For: 2.3.0 Attachments: HADOOP-10130.v1.patch, HADOOP-10130.v2.patch, HADOOP-10130.v2.patch, HDFS-5575.v1.patch RawLocalFS::LocalFSFileInputStream.pread does not track FS::Statistics -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HADOOP-10130) RawLocalFS::LocalFSFileInputStream.pread does not track FS::Statistics
[ https://issues.apache.org/jira/browse/HADOOP-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HADOOP-10130: --- Attachment: HADOOP-10130.v2.patch Build crashed somehow, resubmit. RawLocalFS::LocalFSFileInputStream.pread does not track FS::Statistics -- Key: HADOOP-10130 URL: https://issues.apache.org/jira/browse/HADOOP-10130 Project: Hadoop Common Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Priority: Minor Attachments: HADOOP-10130.v1.patch, HADOOP-10130.v2.patch, HADOOP-10130.v2.patch, HDFS-5575.v1.patch RawLocalFS::LocalFSFileInputStream.pread does not track FS::Statistics -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HADOOP-10130) RawLocalFS::LocalFSFileInputStream.pread does not track FS::Statistics
Binglin Chang created HADOOP-10130: -- Summary: RawLocalFS::LocalFSFileInputStream.pread does not track FS::Statistics Key: HADOOP-10130 URL: https://issues.apache.org/jira/browse/HADOOP-10130 Project: Hadoop Common Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Priority: Minor RawLocalFS::LocalFSFileInputStream.pread does not track FS::Statistics -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HADOOP-10130) RawLocalFS::LocalFSFileInputStream.pread does not track FS::Statistics
[ https://issues.apache.org/jira/browse/HADOOP-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HADOOP-10130: --- Attachment: HDFS-5575.v1.patch Attach patch RawLocalFS::LocalFSFileInputStream.pread does not track FS::Statistics -- Key: HADOOP-10130 URL: https://issues.apache.org/jira/browse/HADOOP-10130 Project: Hadoop Common Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Priority: Minor Attachments: HDFS-5575.v1.patch RawLocalFS::LocalFSFileInputStream.pread does not track FS::Statistics -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HADOOP-10130) RawLocalFS::LocalFSFileInputStream.pread does not track FS::Statistics
[ https://issues.apache.org/jira/browse/HADOOP-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HADOOP-10130: --- Status: Patch Available (was: Open) RawLocalFS::LocalFSFileInputStream.pread does not track FS::Statistics -- Key: HADOOP-10130 URL: https://issues.apache.org/jira/browse/HADOOP-10130 Project: Hadoop Common Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Priority: Minor Attachments: HDFS-5575.v1.patch RawLocalFS::LocalFSFileInputStream.pread does not track FS::Statistics -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HADOOP-10130) RawLocalFS::LocalFSFileInputStream.pread does not track FS::Statistics
[ https://issues.apache.org/jira/browse/HADOOP-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HADOOP-10130: --- Attachment: HADOOP-10130.v1.patch Wrong patch file name, rename and submit again RawLocalFS::LocalFSFileInputStream.pread does not track FS::Statistics -- Key: HADOOP-10130 URL: https://issues.apache.org/jira/browse/HADOOP-10130 Project: Hadoop Common Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Priority: Minor Attachments: HADOOP-10130.v1.patch, HDFS-5575.v1.patch RawLocalFS::LocalFSFileInputStream.pread does not track FS::Statistics -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HADOOP-10130) RawLocalFS::LocalFSFileInputStream.pread does not track FS::Statistics
[ https://issues.apache.org/jira/browse/HADOOP-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13833987#comment-13833987 ] Binglin Chang commented on HADOOP-10130: RawLocalFileSystem is a public class, I was not sure I can remove the class in it, it looks safe to do so. Attaching new patch. RawLocalFS::LocalFSFileInputStream.pread does not track FS::Statistics -- Key: HADOOP-10130 URL: https://issues.apache.org/jira/browse/HADOOP-10130 Project: Hadoop Common Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Priority: Minor Attachments: HADOOP-10130.v1.patch, HADOOP-10130.v2.patch, HDFS-5575.v1.patch RawLocalFS::LocalFSFileInputStream.pread does not track FS::Statistics -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HADOOP-10130) RawLocalFS::LocalFSFileInputStream.pread does not track FS::Statistics
[ https://issues.apache.org/jira/browse/HADOOP-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HADOOP-10130: --- Attachment: HADOOP-10130.v2.patch RawLocalFS::LocalFSFileInputStream.pread does not track FS::Statistics -- Key: HADOOP-10130 URL: https://issues.apache.org/jira/browse/HADOOP-10130 Project: Hadoop Common Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Priority: Minor Attachments: HADOOP-10130.v1.patch, HADOOP-10130.v2.patch, HDFS-5575.v1.patch RawLocalFS::LocalFSFileInputStream.pread does not track FS::Statistics -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HADOOP-9897) Add method to get path start position without drive specifier in o.a.h.fs.Path
[ https://issues.apache.org/jira/browse/HADOOP-9897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HADOOP-9897: -- Attachment: HADOOP-9897.v3.patch Thanks for the review Chris. Attaching new patch addressing your comments. Add method to get path start position without drive specifier in o.a.h.fs.Path Key: HADOOP-9897 URL: https://issues.apache.org/jira/browse/HADOOP-9897 Project: Hadoop Common Issue Type: Improvement Reporter: Binglin Chang Assignee: Binglin Chang Priority: Trivial Attachments: HADOOP-9897.v1.patch, HADOOP-9897.v2.patch, HADOOP-9897.v2.patch, HADOOP-9897.v3.patch There are a lot of code in Path to get start position after skipping drive specifier, like: {code} int start = hasWindowsDrive(uri.getPath()) ? 3 : 0; {code} Also there is a minor bug in mergePaths: mergePath(/, /foo) will yield Path(//foo) which will be parsed as uri authority, not path. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HADOOP-9897) Add method to get path start position without drive specifier in o.a.h.fs.Path
[ https://issues.apache.org/jira/browse/HADOOP-9897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HADOOP-9897: -- Attachment: HADOOP-9897.v4.patch Add method to get path start position without drive specifier in o.a.h.fs.Path Key: HADOOP-9897 URL: https://issues.apache.org/jira/browse/HADOOP-9897 Project: Hadoop Common Issue Type: Improvement Reporter: Binglin Chang Assignee: Binglin Chang Priority: Trivial Attachments: HADOOP-9897.v1.patch, HADOOP-9897.v2.patch, HADOOP-9897.v2.patch, HADOOP-9897.v3.patch, HADOOP-9897.v4.patch There are a lot of code in Path to get start position after skipping drive specifier, like: {code} int start = hasWindowsDrive(uri.getPath()) ? 3 : 0; {code} Also there is a minor bug in mergePaths: mergePath(/, /foo) will yield Path(//foo) which will be parsed as uri authority, not path. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HADOOP-9897) Add method to get path start position without drive specifier in o.a.h.fs.Path
[ https://issues.apache.org/jira/browse/HADOOP-9897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HADOOP-9897: -- Attachment: HADOOP-9897.v5.patch Thanks for the review Chris! Attach new version of the patch. Add method to get path start position without drive specifier in o.a.h.fs.Path Key: HADOOP-9897 URL: https://issues.apache.org/jira/browse/HADOOP-9897 Project: Hadoop Common Issue Type: Improvement Reporter: Binglin Chang Assignee: Binglin Chang Priority: Trivial Attachments: HADOOP-9897.v1.patch, HADOOP-9897.v2.patch, HADOOP-9897.v2.patch, HADOOP-9897.v3.patch, HADOOP-9897.v4.patch, HADOOP-9897.v5.patch There are a lot of code in Path to get start position after skipping drive specifier, like: {code} int start = hasWindowsDrive(uri.getPath()) ? 3 : 0; {code} Also there is a minor bug in mergePaths: mergePath(/, /foo) will yield Path(//foo) which will be parsed as uri authority, not path. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HADOOP-9897) Add method to get path start position without drive specifier in o.a.h.fs.Path
[ https://issues.apache.org/jira/browse/HADOOP-9897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HADOOP-9897: -- Attachment: HADOOP-9897.v6.patch Really sorry for that, attach new patch. Add method to get path start position without drive specifier in o.a.h.fs.Path Key: HADOOP-9897 URL: https://issues.apache.org/jira/browse/HADOOP-9897 Project: Hadoop Common Issue Type: Improvement Components: fs Affects Versions: 3.0.0, 2.2.0 Reporter: Binglin Chang Assignee: Binglin Chang Priority: Trivial Attachments: HADOOP-9897.v1.patch, HADOOP-9897.v2.patch, HADOOP-9897.v2.patch, HADOOP-9897.v3.patch, HADOOP-9897.v4.patch, HADOOP-9897.v5.patch, HADOOP-9897.v6.patch There are a lot of code in Path to get start position after skipping drive specifier, like: {code} int start = hasWindowsDrive(uri.getPath()) ? 3 : 0; {code} Also there is a minor bug in mergePaths: mergePath(/, /foo) will yield Path(//foo) which will be parsed as uri authority, not path. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HADOOP-9897) Add method to get path start position without drive specifier in o.a.h.fs.Path
[ https://issues.apache.org/jira/browse/HADOOP-9897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13791328#comment-13791328 ] Binglin Chang commented on HADOOP-9897: --- Hi [~cnauroth], Could you help review the patch again and get this committed? Thanks. Add method to get path start position without drive specifier in o.a.h.fs.Path Key: HADOOP-9897 URL: https://issues.apache.org/jira/browse/HADOOP-9897 Project: Hadoop Common Issue Type: Improvement Reporter: Binglin Chang Assignee: Binglin Chang Priority: Trivial Attachments: HADOOP-9897.v1.patch, HADOOP-9897.v2.patch, HADOOP-9897.v2.patch There are a lot of code in Path to get start position after skipping drive specifier, like: {code} int start = hasWindowsDrive(uri.getPath()) ? 3 : 0; {code} Also there is a minor bug in mergePaths: mergePath(/, /foo) will yield Path(//foo) which will be parsed as uri authority, not path. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HADOOP-9972) new APIs for listStatus and globStatus to deal with symlinks
[ https://issues.apache.org/jira/browse/HADOOP-9972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776870#comment-13776870 ] Binglin Chang commented on HADOOP-9972: --- bq. Also, if we want to add more options in the future, we don't want to create listLinkStatusWithFoo and listLinkStatusWithFooAndBar. Just listStatus(Path, PathOption). That is exactly why I propose listStatus(Path, PathOption) implemented in FileSystem using more primitive listLinkStatus(Path), so If we add an option, we don't end up modify all sub FileSystems code. bq. we don't want to create listLinkStatusWithFoo and listLinkStatusWithFooAndBar. Just listStatus(Path, PathOption). I am not against listStatus(Path, PathOption) API, just its implementation detail, this issue can be solved by listStatus(Path, PathOption). bq. Hadoop and HDFS exist in an environment where there are unreliable networks. I don't think ignore all error including network issues, it is like disk failure/temporary unreadable issues in linux, globbing can't ignore that either, in that case error should just be passed all the way up to user, most user don't want to handle this error in ErrorHandler too. bq. So if globStatus swallows unresolved symlink errors. Are you saying network issue can cause unresolved symlink error? If dead link error is already mixed up with network errors, plus compatibility reasons, I agree with you, we can't follow linux practice. new APIs for listStatus and globStatus to deal with symlinks Key: HADOOP-9972 URL: https://issues.apache.org/jira/browse/HADOOP-9972 Project: Hadoop Common Issue Type: Improvement Components: fs Affects Versions: 2.1.1-beta Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Based on the discussion in HADOOP-9912, we need new APIs for FileSystem to deal with symlinks. The issue is that code has been written which is incompatible with the existence of things which are not files or directories. For example, there is a lot of code out there that looks at FileStatus#isFile, and if it returns false, assumes that what it is looking at is a directory. In the case of a symlink, this assumption is incorrect. It seems reasonable to make the default behavior of {{FileSystem#listStatus}} and {{FileSystem#globStatus}} be fully resolving symlinks, and ignoring dangling ones. This will prevent incompatibility with existing MR jobs and other HDFS users. We should also add new versions of listStatus and globStatus that allow new, symlink-aware code to deal with symlinks as symlinks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9972) new APIs for listStatus and globStatus to deal with symlinks
[ https://issues.apache.org/jira/browse/HADOOP-9972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773584#comment-13773584 ] Binglin Chang commented on HADOOP-9972: --- bq. Hmm. We could have a convenience method called listLinkStatus which just called into listStatus with the correct PathOptions. I sort of lean towards fewer APIs rather than more, but maybe it makes sense. I mean listStatus(Path, PathOption) should call into listLinkStatus(it is HDFS::listStatus which is a primitive RPC call), not the other way around. I wonder how can we implement listStatus(Path, PathOption) without the primitive of listLinkStatus(Path)? bq. Shell globbing doesn't ignore all errors What I say of globbing is just shell wildcard substitution, it indeed ignore all errors, glob just substitute a string with wildcard to some string. http://www.linuxjournal.com/content/bash-extended-globbing http://tldp.org/LDP/abs/html/globbingref.html {code} drwxr-xr-x 2 decster staff 68 Sep 19 17:09 aa drwxr-xr-x 2 decster staff 68 Sep 19 17:12 bb decster:~/projects/test echo * aa bb decster:~/projects/test echo */cc */cc {code} In your example: {code} cmccabe@keter:~/mydir ls b/c ls: cannot access b/c: Permission denied # this error is thrown by ls, not globbing cmccabe@keter:~/mydir ls * a: c ls: cannot open directory b: Permission denied # ls * first become ls a c # then ls throw the error when process c {code} new APIs for listStatus and globStatus to deal with symlinks Key: HADOOP-9972 URL: https://issues.apache.org/jira/browse/HADOOP-9972 Project: Hadoop Common Issue Type: Improvement Components: fs Affects Versions: 2.1.1-beta Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Based on the discussion in HADOOP-9912, we need new APIs for FileSystem to deal with symlinks. The issue is that code has been written which is incompatible with the existence of things which are not files or directories. For example, there is a lot of code out there that looks at FileStatus#isFile, and if it returns false, assumes that what it is looking at is a directory. In the case of a symlink, this assumption is incorrect. It seems reasonable to make the default behavior of {{FileSystem#listStatus}} and {{FileSystem#globStatus}} be fully resolving symlinks, and ignoring dangling ones. This will prevent incompatibility with existing MR jobs and other HDFS users. We should also add new versions of listStatus and globStatus that allow new, symlink-aware code to deal with symlinks as symlinks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9972) new APIs for listStatus and globStatus to deal with symlinks
[ https://issues.apache.org/jira/browse/HADOOP-9972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772566#comment-13772566 ] Binglin Chang commented on HADOOP-9972: --- There are two issues we are talking about, one is new API: bq. The discussion about whether HDFS should replace listStatus with something more like POSIX readdir seems like a tangent. I think there is a confusion here, I didn't propose to use POSIX readdir. The API name readdir is probably causing confusion here so I changed to the listLinkStatus instead, it's semantics is the same as current hdfs listStatus which doesn't resolve links. bq. To prevent this scenario, we want to change FileStatus#listStatus and FileStatus#globStatus to resolve all symlinks I'am fully aware of this, and my proposal do not break this. Frankly I don't see any conflict in the two proposals. I order to implement listStatus(Path, PathOption), a listLinkStatus(or something with the same semantics) primitive/core API is required, and it is mostly there(HDFS, other fs doesn't support symlink, except LocalFS). Since there is no conflict from my side, I think you can just submit the patch or give the implementation detail of listStatus(Path, PathOption) first. Another issue is globbing didn't follow linux practice: It is probably a tangent, it is brought up just because the example about usage of PathErrorHandler. I say that Linux shell globbing ignore all errors, the example can be solved by following linux practice. If we decide not to follow linux practice and solve it another way, that is OK, although I prefer linux practice. new APIs for listStatus and globStatus to deal with symlinks Key: HADOOP-9972 URL: https://issues.apache.org/jira/browse/HADOOP-9972 Project: Hadoop Common Issue Type: Improvement Components: fs Affects Versions: 2.1.1-beta Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Based on the discussion in HADOOP-9912, we need new APIs for FileSystem to deal with symlinks. The issue is that code has been written which is incompatible with the existence of things which are not files or directories. For example, there is a lot of code out there that looks at FileStatus#isFile, and if it returns false, assumes that what it is looking at is a directory. In the case of a symlink, this assumption is incorrect. It seems reasonable to make the default behavior of {{FileSystem#listStatus}} and {{FileSystem#globStatus}} be fully resolving symlinks, and ignoring dangling ones. This will prevent incompatibility with existing MR jobs and other HDFS users. We should also add new versions of listStatus and globStatus that allow new, symlink-aware code to deal with symlinks as symlinks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9972) new APIs for listStatus and globStatus to deal with symlinks
[ https://issues.apache.org/jira/browse/HADOOP-9972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772570#comment-13772570 ] Binglin Chang commented on HADOOP-9972: --- You probably are confused by my earlier comments. I did not mean listLinkStatus only return filename and type. bq. Most linux/bsd system, readdir return filename and type. I mean linux readdir in my comments, not the core API listLinkStatus. new APIs for listStatus and globStatus to deal with symlinks Key: HADOOP-9972 URL: https://issues.apache.org/jira/browse/HADOOP-9972 Project: Hadoop Common Issue Type: Improvement Components: fs Affects Versions: 2.1.1-beta Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Based on the discussion in HADOOP-9912, we need new APIs for FileSystem to deal with symlinks. The issue is that code has been written which is incompatible with the existence of things which are not files or directories. For example, there is a lot of code out there that looks at FileStatus#isFile, and if it returns false, assumes that what it is looking at is a directory. In the case of a symlink, this assumption is incorrect. It seems reasonable to make the default behavior of {{FileSystem#listStatus}} and {{FileSystem#globStatus}} be fully resolving symlinks, and ignoring dangling ones. This will prevent incompatibility with existing MR jobs and other HDFS users. We should also add new versions of listStatus and globStatus that allow new, symlink-aware code to deal with symlinks as symlinks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9972) new APIs for listStatus and globStatus to deal with symlinks
[ https://issues.apache.org/jira/browse/HADOOP-9972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13770471#comment-13770471 ] Binglin Chang commented on HADOOP-9972: --- Hi Colin, About globStatus example, if we follow linux practice, globStatus(p) = glob(pattern).map(path = getFileStatus(path)) String [] glob(pattern): if matches none, return pattern else return matched paths ignore all exceptions I did some experiments, you can see ls * indeed should error message, but ls */stuff should not show error message. {code} [root@master01 test]# mkdir -p aa/cc/foo [root@master01 test]# mkdir -p bb/cc/foo [root@master01 test]# chmod 700 bb [root@master01 test]# ll /home/serengeti/.bash [root@master01 test]# su serengeti [serengeti@master01 test]$ ll total 8 drwxr-xr-x 3 root root 4096 Sep 18 08:30 aa drwx-- 3 root root 4096 Sep 18 08:31 bb [serengeti@master01 test]$ ls * aa: cc ls: bb: Permission denied [serengeti@master01 test]$ ls */cc foo {code} Separate globStatus to glob and getFileStatus seems a more proper way of doing globStatus rather than add new classes/interface and callback handler, and this is linux practice, should be more robust. new APIs for listStatus and globStatus to deal with symlinks Key: HADOOP-9972 URL: https://issues.apache.org/jira/browse/HADOOP-9972 Project: Hadoop Common Issue Type: Improvement Components: fs Affects Versions: 2.1.1-beta Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Based on the discussion in HADOOP-9912, we need new APIs for FileSystem to deal with symlinks. The issue is that code has been written which is incompatible with the existence of things which are not files or directories. For example, there is a lot of code out there that looks at FileStatus#isFile, and if it returns false, assumes that what it is looking at is a directory. In the case of a symlink, this assumption is incorrect. It seems reasonable to make the default behavior of {{FileSystem#listStatus}} and {{FileSystem#globStatus}} be fully resolving symlinks, and ignoring dangling ones. This will prevent incompatibility with existing MR jobs and other HDFS users. We should also add new versions of listStatus and globStatus that allow new, symlink-aware code to deal with symlinks as symlinks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9972) new APIs for listStatus and globStatus to deal with symlinks
[ https://issues.apache.org/jira/browse/HADOOP-9972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13770496#comment-13770496 ] Binglin Chang commented on HADOOP-9972: --- Regarding API, I think we should differentiate core API and extend/legacy API, IMO, there should be 3 core API: getFileStatus resolve symlink getFileLinkStatus don't resolve symlink readdir don't resolve symlink, just like current HDFS listStatus These core API should be implemented in each FS All other related APIs can be build based on core API and implemented in FSContext/FileSystem once for all: {code} FS.listStatus(path): readdir(path).map(s = if (s.isSymlink) getFileStatus ignore Exception else s) FS.listStatus(path, PathOptions): readdir(path).map(process PathOptions) glob(pattern): if pattern matches none, return pattern else return matched paths ignore all exceptions globStatus(pattern): glob(pattern).map(getFileStatus) {code} new APIs for listStatus and globStatus to deal with symlinks Key: HADOOP-9972 URL: https://issues.apache.org/jira/browse/HADOOP-9972 Project: Hadoop Common Issue Type: Improvement Components: fs Affects Versions: 2.1.1-beta Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Based on the discussion in HADOOP-9912, we need new APIs for FileSystem to deal with symlinks. The issue is that code has been written which is incompatible with the existence of things which are not files or directories. For example, there is a lot of code out there that looks at FileStatus#isFile, and if it returns false, assumes that what it is looking at is a directory. In the case of a symlink, this assumption is incorrect. It seems reasonable to make the default behavior of {{FileSystem#listStatus}} and {{FileSystem#globStatus}} be fully resolving symlinks, and ignoring dangling ones. This will prevent incompatibility with existing MR jobs and other HDFS users. We should also add new versions of listStatus and globStatus that allow new, symlink-aware code to deal with symlinks as symlinks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira