[jira] [Commented] (HDFS-14564) Add libhdfs APIs for readFully; add readFully to ByteBufferPositionedReadable
[ https://issues.apache.org/jira/browse/HDFS-14564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933541#comment-16933541 ] Sahil Takiar commented on HDFS-14564: - Looks like the libhdfs tests are working now. All the failed JUnit tests look flaky, and pass when I run them locally. [~smeng], [~jojochuang] any other comments? > Add libhdfs APIs for readFully; add readFully to ByteBufferPositionedReadable > - > > Key: HDFS-14564 > URL: https://issues.apache.org/jira/browse/HDFS-14564 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, libhdfs, native >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > Splitting this out from HDFS-14478 > The {{PositionedReadable#readFully}} APIs have existed for a while, but have > never been exposed via libhdfs. > HDFS-3246 added a new interface called {{ByteBufferPositionedReadable}} that > provides a {{ByteBuffer}} version of {{PositionedReadable}}, but it does not > contain a {{readFully}} method. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-14846) libhdfs tests are failing on trunk due to jni usage bugs
[ https://issues.apache.org/jira/browse/HDFS-14846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar resolved HDFS-14846. - Fix Version/s: 3.3.0 Resolution: Fixed > libhdfs tests are failing on trunk due to jni usage bugs > > > Key: HDFS-14846 > URL: https://issues.apache.org/jira/browse/HDFS-14846 > Project: Hadoop HDFS > Issue Type: Bug > Components: libhdfs, native >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Fix For: 3.3.0 > > > While working on HDFS-14564, I noticed that the libhdfs tests are failing on > trunk (both on Hadoop QA and locally). I did some digging and found out that > the {{-Xcheck:jni}} flag is causing a bunch of crashes. I haven't been able > to pinpoint what caused this regression, but my best guess is that an upgrade > in the JDK we use in Hadoop QA started causing these failures. I looked back > at some old JIRAs and it looks like the tests work on Java 1.8.0_212, but > Hadoop QA is running 1.8.0_222 (as is my local env) (I couldn't confirm this > theory because I'm having trouble getting Java 1.8.0_212 installed next to > 1.8.0_222 on my Ubuntu machine) (even after re-winding the commit history > back to a known good commit where the libhdfs passed, the tests still fail, > so I don't think a code change caused the regressions). > The failures are a bunch of "FATAL ERROR in native method: Bad global or > local ref passed to JNI" errors. After doing some debugging, it looks like > {{-Xcheck:jni}} now errors out if any code tries to pass a local ref to > {{DeleteLocalRef}} twice (previously it looked like it didn't complain) (we > have some checks to avoid this, but it looks like they don't work as > expected). > There are a few places in the libhdfs code where this pattern causes a crash, > as well as one place in {{JniBasedUnixGroupsMapping}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14846) libhdfs tests are failing on trunk due to jni usage bugs
[ https://issues.apache.org/jira/browse/HDFS-14846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HDFS-14846: Description: While working on HDFS-14564, I noticed that the libhdfs tests are failing on trunk (both on Hadoop QA and locally). I did some digging and found out that the {{-Xcheck:jni}} flag is causing a bunch of crashes. I haven't been able to pinpoint what caused this regression, but my best guess is that an upgrade in the JDK we use in Hadoop QA started causing these failures. I looked back at some old JIRAs and it looks like the tests work on Java 1.8.0_212, but Hadoop QA is running 1.8.0_222 (as is my local env) (I couldn't confirm this theory because I'm having trouble getting Java 1.8.0_212 installed next to 1.8.0_222 on my Ubuntu machine) (even after re-winding the commit history back to a known good commit where the libhdfs passed, the tests still fail, so I don't think a code change caused the regressions). The failures are a bunch of "FATAL ERROR in native method: Bad global or local ref passed to JNI" errors. After doing some debugging, it looks like {{-Xcheck:jni}} now errors out if any code tries to pass a local ref to {{DeleteLocalRef}} twice (previously it looked like it didn't complain) (we have some checks to avoid this, but it looks like they don't work as expected). There are a few places in the libhdfs code where this pattern causes a crash, as well as one place in {{JniBasedUnixGroupsMapping}}. was: While working on HDFS-14564, I noticed that the libhdfs tests are failing on trunk (both on hadoop-yetus and locally). I dig some digging and found out that the {{-Xcheck:jni}} flag is causing a bunch of crashes. I haven't been able to pinpoint what caused this regression, but my best guess is that an upgrade in the JDK we use in hadoop-yetus started causing these failures. I looked back at some old JIRAs and it looks like the tests work on Java 1.8.0_212, but yetus is running 1.8.0_222 (as is my local env) (I couldn't confirm this theory because I'm having trouble getting install 1.8.0_212 next to 1.8.0_222 on my Ubuntu machine) (even after re-winding the commit history back to a known good commit where the libhdfs passed, the tests still fail, so I don't think a code change caused the regressions). The failures are a bunch of "FATAL ERROR in native method: Bad global or local ref passed to JNI" errors. After doing some debugging, it looks like {{-Xcheck:jni}} now errors out if any code tries to pass a local ref to {{DeleteLocalRef}} twice (previously it looked like it didn't complain) (we have some checks to avoid this, but it looks like they don't work as expected). There are a few places in the libhdfs code where this pattern causes a crash, as well as one place in {{JniBasedUnixGroupsMapping}}. > libhdfs tests are failing on trunk due to jni usage bugs > > > Key: HDFS-14846 > URL: https://issues.apache.org/jira/browse/HDFS-14846 > Project: Hadoop HDFS > Issue Type: Bug > Components: libhdfs, native >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > While working on HDFS-14564, I noticed that the libhdfs tests are failing on > trunk (both on Hadoop QA and locally). I did some digging and found out that > the {{-Xcheck:jni}} flag is causing a bunch of crashes. I haven't been able > to pinpoint what caused this regression, but my best guess is that an upgrade > in the JDK we use in Hadoop QA started causing these failures. I looked back > at some old JIRAs and it looks like the tests work on Java 1.8.0_212, but > Hadoop QA is running 1.8.0_222 (as is my local env) (I couldn't confirm this > theory because I'm having trouble getting Java 1.8.0_212 installed next to > 1.8.0_222 on my Ubuntu machine) (even after re-winding the commit history > back to a known good commit where the libhdfs passed, the tests still fail, > so I don't think a code change caused the regressions). > The failures are a bunch of "FATAL ERROR in native method: Bad global or > local ref passed to JNI" errors. After doing some debugging, it looks like > {{-Xcheck:jni}} now errors out if any code tries to pass a local ref to > {{DeleteLocalRef}} twice (previously it looked like it didn't complain) (we > have some checks to avoid this, but it looks like they don't work as > expected). > There are a few places in the libhdfs code where this pattern causes a crash, > as well as one place in {{JniBasedUnixGroupsMapping}}. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14846) libhdfs tests are failing on trunk due to jni usage bugs
[ https://issues.apache.org/jira/browse/HDFS-14846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16929293#comment-16929293 ] Sahil Takiar commented on HDFS-14846: - Hadoop QA looks good. [~jojochuang], [~smeng] could you take a look? > libhdfs tests are failing on trunk due to jni usage bugs > > > Key: HDFS-14846 > URL: https://issues.apache.org/jira/browse/HDFS-14846 > Project: Hadoop HDFS > Issue Type: Bug > Components: libhdfs, native >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > While working on HDFS-14564, I noticed that the libhdfs tests are failing on > trunk (both on hadoop-yetus and locally). I dig some digging and found out > that the {{-Xcheck:jni}} flag is causing a bunch of crashes. I haven't been > able to pinpoint what caused this regression, but my best guess is that an > upgrade in the JDK we use in hadoop-yetus started causing these failures. I > looked back at some old JIRAs and it looks like the tests work on Java > 1.8.0_212, but yetus is running 1.8.0_222 (as is my local env) (I couldn't > confirm this theory because I'm having trouble getting install 1.8.0_212 next > to 1.8.0_222 on my Ubuntu machine) (even after re-winding the commit history > back to a known good commit where the libhdfs passed, the tests still fail, > so I don't think a code change caused the regressions). > The failures are a bunch of "FATAL ERROR in native method: Bad global or > local ref passed to JNI" errors. After doing some debugging, it looks like > {{-Xcheck:jni}} now errors out if any code tries to pass a local ref to > {{DeleteLocalRef}} twice (previously it looked like it didn't complain) (we > have some checks to avoid this, but it looks like they don't work as > expected). > There are a few places in the libhdfs code where this pattern causes a crash, > as well as one place in {{JniBasedUnixGroupsMapping}}. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14846) libhdfs tests are failing on trunk due to jni usage bugs
Sahil Takiar created HDFS-14846: --- Summary: libhdfs tests are failing on trunk due to jni usage bugs Key: HDFS-14846 URL: https://issues.apache.org/jira/browse/HDFS-14846 Project: Hadoop HDFS Issue Type: Bug Components: libhdfs, native Reporter: Sahil Takiar Assignee: Sahil Takiar While working on HDFS-14564, I noticed that the libhdfs tests are failing on trunk (both on hadoop-yetus and locally). I dig some digging and found out that the {{-Xcheck:jni}} flag is causing a bunch of crashes. I haven't been able to pinpoint what caused this regression, but my best guess is that an upgrade in the JDK we use in hadoop-yetus started causing these failures. I looked back at some old JIRAs and it looks like the tests work on Java 1.8.0_212, but yetus is running 1.8.0_222 (as is my local env) (I couldn't confirm this theory because I'm having trouble getting install 1.8.0_212 next to 1.8.0_222 on my Ubuntu machine) (even after re-winding the commit history back to a known good commit where the libhdfs passed, the tests still fail, so I don't think a code change caused the regressions). The failures are a bunch of "FATAL ERROR in native method: Bad global or local ref passed to JNI" errors. After doing some debugging, it looks like {{-Xcheck:jni}} now errors out if any code tries to pass a local ref to {{DeleteLocalRef}} twice (previously it looked like it didn't complain) (we have some checks to avoid this, but it looks like they don't work as expected). There are a few places in the libhdfs code where this pattern causes a crash, as well as one place in {{JniBasedUnixGroupsMapping}}. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13984) getFileInfo of libhdfs call NameNode#getFileStatus twice
[ https://issues.apache.org/jira/browse/HDFS-13984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871888#comment-16871888 ] Sahil Takiar commented on HDFS-13984: - Yeah, this makes sense to me. [~yangjiandan]do you have plans to continue working on this? A few comments: * Use {{javaObjectIsOfClass}} in {{jni_helper.h}} * The {{FileNotFoundException}} needs to be freed before returning null > getFileInfo of libhdfs call NameNode#getFileStatus twice > > > Key: HDFS-13984 > URL: https://issues.apache.org/jira/browse/HDFS-13984 > Project: Hadoop HDFS > Issue Type: Improvement > Components: libhdfs >Reporter: Jiandan Yang >Assignee: Jiandan Yang >Priority: Major > Attachments: HDFS-13984.001.patch, HDFS-13984.002.patch, > HDFS-13984.003.patch > > > getFileInfo in hdfs.c calls *FileSystem#exists* first, then calls > *FileSystem#getFileStatus*. > *FileSystem#exists* also call *FileSystem#getFileStatus*, just as follows: > {code:java} > public boolean exists(Path f) throws IOException { > try { > return getFileStatus(f) != null; > } catch (FileNotFoundException e) { > return false; > } > } > {code} > and finally this leads to call NameNodeRpcServer#getFileInfo twice. > Actually we can implement by calling once. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14564) Add libhdfs APIs for readFully; add readFully to ByteBufferPositionedReadable
[ https://issues.apache.org/jira/browse/HDFS-14564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16865707#comment-16865707 ] Sahil Takiar commented on HDFS-14564: - {{test_libhdfs_threaded_hdfspp_test_shim_static}} is a hdfs++ test, which should be unrelated to this feature. Will re-trigger Hadoop QA. > Add libhdfs APIs for readFully; add readFully to ByteBufferPositionedReadable > - > > Key: HDFS-14564 > URL: https://issues.apache.org/jira/browse/HDFS-14564 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, libhdfs, native >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > Splitting this out from HDFS-14478 > The {{PositionedReadable#readFully}} APIs have existed for a while, but have > never been exposed via libhdfs. > HDFS-3246 added a new interface called {{ByteBufferPositionedReadable}} that > provides a {{ByteBuffer}} version of {{PositionedReadable}}, but it does not > contain a {{readFully}} method. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14564) Add libhdfs APIs for readFully; add readFully to ByteBufferPositionedReadable
[ https://issues.apache.org/jira/browse/HDFS-14564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16864460#comment-16864460 ] Sahil Takiar commented on HDFS-14564: - [~smeng] addressed the checkstyle issues. Ran the failed unit tests locally and they pass. > Add libhdfs APIs for readFully; add readFully to ByteBufferPositionedReadable > - > > Key: HDFS-14564 > URL: https://issues.apache.org/jira/browse/HDFS-14564 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, libhdfs, native >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > Splitting this out from HDFS-14478 > The {{PositionedReadable#readFully}} APIs have existed for a while, but have > never been exposed via libhdfs. > HDFS-3246 added a new interface called {{ByteBufferPositionedReadable}} that > provides a {{ByteBuffer}} version of {{PositionedReadable}}, but it does not > contain a {{readFully}} method. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14564) Add libhdfs APIs for readFully; add readFully to ByteBufferPositionedReadable
[ https://issues.apache.org/jira/browse/HDFS-14564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HDFS-14564: Summary: Add libhdfs APIs for readFully; add readFully to ByteBufferPositionedReadable (was: Add libhdfs APIs for readFully; add readFully to ByteByfferPositionedReadable) > Add libhdfs APIs for readFully; add readFully to ByteBufferPositionedReadable > - > > Key: HDFS-14564 > URL: https://issues.apache.org/jira/browse/HDFS-14564 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, libhdfs, native >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > Splitting this out from HDFS-14478 > The {{PositionedReadable#readFully}} APIs have existed for a while, but have > never been exposed via libhdfs. > HDFS-3246 added a new interface called {{ByteBufferPositionedReadable}} that > provides a {{ByteBuffer}} version of {{PositionedReadable}}, but it does not > contain a {{readFully}} method. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14478) Add libhdfs APIs for openFile
[ https://issues.apache.org/jira/browse/HDFS-14478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HDFS-14478: Description: HADOOP-15229 added a "FileSystem builder-based openFile() API" that allows specifying configuration values for opening files (similar to HADOOP-14365). Support for {{openFile}} will be a little tricky as it is asynchronous and {{FutureDataInputStreamBuilder#build}} returns a {{CompletableFuture}}. At a high level, the API for {{openFile}} could look something like this: {code:java} hdfsFile hdfsOpenFile(hdfsFS fs, const char* path, int flags, int bufferSize, short replication, tSize blocksize); hdfsOpenFileBuilder *hdfsOpenFileBuilderAlloc(hdfsFS fs, const char *path); hdfsOpenFileBuilder *hdfsOpenFileBuilderMust(hdfsOpenFileBuilder *builder, const char *key, const char *value); hdfsOpenFileBuilder *hdfsOpenFileBuilderOpt(hdfsOpenFileBuilder *builder, const char *key, const char *value); hdfsOpenFileFuture *hdfsOpenFileBuilderBuild(hdfsOpenFileBuilder *builder); void hdfsOpenFileBuilderFree(hdfsOpenFileBuilder *builder); hdfsFile hdfsOpenFileFutureGet(hdfsOpenFileFuture *future); hdfsFile hdfsOpenFileFutureGetWithTimeout(hdfsOpenFileFuture *future, int64_t timeout, javaConcurrentTimeUnit timeUnit); int hdfsOpenFileFutureCancel(hdfsOpenFileFuture *future, int mayInterruptIfRunning); void hdfsOpenFileFutureFree(hdfsOpenFileFuture *future); {code} Instead of exposing all the functionality of {{CompleteableFuture}} libhdfs would just expose the functionality of {{Future}}. was: HADOOP-15229 added a "FileSystem builder-based openFile() API" that allows specifying configuration values for opening files (similar to HADOOP-14365). Support for {{openFile}} will be a little tricky as it is asynchronous and {{FutureDataInputStreamBuilder#build}} returns a {{CompletableFuture}}. At a high level, the API for {{openFile}} could look something like this: {code:java} LIBHDFS_EXTERNAL struct hdfsOpenFileBuilder *hdfsOpenFileBuilderAlloc(hdfsFS fs, const char *path, int flags); LIBHDFS_EXTERNAL void hdfsOpenFileBuilderFree(struct hdfsOpenFileBuilder *bld); LIBHDFS_EXTERNAL struct hdfsOpenFileBuilder *hdfsOpenFileBuilderMust(hdfsFS fs, const char *key, const char *value); LIBHDFS_EXTERNAL struct hdfsOpenFileBuilder *hdfsOpenFileBuilderOpt(hdfsFS fs, const char *key, const char *value); LIBHDFS_EXTERNAL hdfsFileFuture *hdfsOpenFileBuilderBuild(struct hdfsOpenFileBuilder *bld); LIBHDFS_EXTERNAL void hdfsOpenFileFutureFree(struct hdfsFileFuture *future); LIBHDFS_EXTERNAL hdfsFile hdfsOpenFileFutureGet(struct hdfsFileFuture *future); LIBHDFS_EXTERNAL hdfsFile hdfsOpenFileFutureGet(struct hdfsFileFuture *future, int64_t timeout, const char* timeUnit); LIBHDFS_EXTERNAL void hdfsOpenFileFutureCancel(struct hdfsFileFuture *future, bool mayInterruptIfRunning); {code} Instead of exposing all the functionality of {{CompleteableFuture}} libhdfs would just expose the functionality of {{Future}}. > Add libhdfs APIs for openFile > - > > Key: HDFS-14478 > URL: https://issues.apache.org/jira/browse/HDFS-14478 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, libhdfs, native >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > HADOOP-15229 added a "FileSystem builder-based openFile() API" that allows > specifying configuration values for opening files (similar to HADOOP-14365). > Support for {{openFile}} will be a little tricky as it is asynchronous and > {{FutureDataInputStreamBuilder#build}} returns a {{CompletableFuture}}. > At a high level, the API for {{openFile}} could look something like this: > {code:java} > hdfsFile hdfsOpenFile(hdfsFS fs, const char* path, int flags, > int bufferSize, short replication, tSize blocksize); > hdfsOpenFileBuilder *hdfsOpenFileBuilderAlloc(hdfsFS fs, > const char *path); > hdfsOpenFileBuilder *hdfsOpenFileBuilderMust(hdfsOpenFileBuilder *builder, > const char *key, const char *value); > hdfsOpenFileBuilder *hdfsOpenFileBuilderOpt(hdfsOpenFileBuilder *builder, > const char *key, const char *value); > hdfsOpenFileFuture *hdfsOpenFileBuilderBuild(hdfsOpenFileBuilder *builder); > void hdfsOpenFileBuilderFree(hdfsOpenFileBuilder *builder); > hdfsFile hdfsOpenFileFutureGet(hdfsOpenFileFuture *future); > hdfsFile hdfsOpenFileFutureGetWithTimeout(hdfsOpenFileFuture *future, > int64_t timeout, javaConcurrentTimeUnit timeUnit); > int hdfsOpenFileFutureCancel(hdfsOpenFileFuture *future, > int mayInterruptIfRunning); > void
[jira] [Updated] (HDFS-14478) Add libhdfs APIs for openFile
[ https://issues.apache.org/jira/browse/HDFS-14478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HDFS-14478: Description: HADOOP-15229 added a "FileSystem builder-based openFile() API" that allows specifying configuration values for opening files (similar to HADOOP-14365). Support for {{openFile}} will be a little tricky as it is asynchronous and {{FutureDataInputStreamBuilder#build}} returns a {{CompletableFuture}}. At a high level, the API for {{openFile}} could look something like this: {code:java} LIBHDFS_EXTERNAL struct hdfsOpenFileBuilder *hdfsOpenFileBuilderAlloc(hdfsFS fs, const char *path, int flags); LIBHDFS_EXTERNAL void hdfsOpenFileBuilderFree(struct hdfsOpenFileBuilder *bld); LIBHDFS_EXTERNAL struct hdfsOpenFileBuilder *hdfsOpenFileBuilderMust(hdfsFS fs, const char *key, const char *value); LIBHDFS_EXTERNAL struct hdfsOpenFileBuilder *hdfsOpenFileBuilderOpt(hdfsFS fs, const char *key, const char *value); LIBHDFS_EXTERNAL hdfsFileFuture *hdfsOpenFileBuilderBuild(struct hdfsOpenFileBuilder *bld); LIBHDFS_EXTERNAL void hdfsOpenFileFutureFree(struct hdfsFileFuture *future); LIBHDFS_EXTERNAL hdfsFile hdfsOpenFileFutureGet(struct hdfsFileFuture *future); LIBHDFS_EXTERNAL hdfsFile hdfsOpenFileFutureGet(struct hdfsFileFuture *future, int64_t timeout, const char* timeUnit); LIBHDFS_EXTERNAL void hdfsOpenFileFutureCancel(struct hdfsFileFuture *future, bool mayInterruptIfRunning); {code} Instead of exposing all the functionality of {{CompleteableFuture}} libhdfs would just expose the functionality of {{Future}}. was: HADOOP-15229 added a "FileSystem builder-based openFile() API" that allows specifying configuration values for opening files (similar to HADOOP-14365). The {{PositionedReadable#readFully}} APIs have always existed. Both of these APIs should be exposed by libhdfs. Adding support for {{readFully}} should be straight-forward (a new libhdfs API called {{hdfsPreadFully}} whose implementation is similar to {{hdfsPread}}). Support for {{openFile}} will be a little tricker as it is asynchronous and {{FutureDataInputStreamBuilder#build}} returns a {{CompletableFuture}}. At a high level, the API for {{openFile}} could look something like this: {code:java} LIBHDFS_EXTERNAL struct hdfsOpenFileBuilder *hdfsOpenFileBuilderAlloc(hdfsFS fs, const char *path, int flags); LIBHDFS_EXTERNAL void hdfsOpenFileBuilderFree(struct hdfsOpenFileBuilder *bld); LIBHDFS_EXTERNAL struct hdfsOpenFileBuilder *hdfsOpenFileBuilderMust(hdfsFS fs, const char *key, const char *value); LIBHDFS_EXTERNAL struct hdfsOpenFileBuilder *hdfsOpenFileBuilderOpt(hdfsFS fs, const char *key, const char *value); LIBHDFS_EXTERNAL hdfsFileFuture *hdfsOpenFileBuilderBuild(struct hdfsOpenFileBuilder *bld); LIBHDFS_EXTERNAL void hdfsOpenFileFutureFree(struct hdfsFileFuture *future); LIBHDFS_EXTERNAL hdfsFile hdfsOpenFileFutureGet(struct hdfsFileFuture *future); LIBHDFS_EXTERNAL hdfsFile hdfsOpenFileFutureGet(struct hdfsFileFuture *future, int64_t timeout, const char* timeUnit); LIBHDFS_EXTERNAL void hdfsOpenFileFutureCancel(struct hdfsFileFuture *future, bool mayInterruptIfRunning); {code} Instead of exposing all the functionality of {{CompleteableFuture}} libhdfs would just expose the functionality of {{Future}}. > Add libhdfs APIs for openFile > - > > Key: HDFS-14478 > URL: https://issues.apache.org/jira/browse/HDFS-14478 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, libhdfs, native >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > HADOOP-15229 added a "FileSystem builder-based openFile() API" that allows > specifying configuration values for opening files (similar to HADOOP-14365). > > Support for {{openFile}} will be a little tricky as it is asynchronous and > {{FutureDataInputStreamBuilder#build}} returns a {{CompletableFuture}}. > At a high level, the API for {{openFile}} could look something like this: > {code:java} > LIBHDFS_EXTERNAL > struct hdfsOpenFileBuilder *hdfsOpenFileBuilderAlloc(hdfsFS fs, > const char *path, int flags); > LIBHDFS_EXTERNAL > void hdfsOpenFileBuilderFree(struct hdfsOpenFileBuilder *bld); > LIBHDFS_EXTERNAL > struct hdfsOpenFileBuilder *hdfsOpenFileBuilderMust(hdfsFS fs, > const char *key, const char *value); > LIBHDFS_EXTERNAL > struct hdfsOpenFileBuilder *hdfsOpenFileBuilderOpt(hdfsFS fs, > const char *key, const char *value); > LIBHDFS_EXTERNAL > hdfsFileFuture
[jira] [Created] (HDFS-14564) Add libhdfs APIs for readFully; add readFully to ByteByfferPositionedReadable
Sahil Takiar created HDFS-14564: --- Summary: Add libhdfs APIs for readFully; add readFully to ByteByfferPositionedReadable Key: HDFS-14564 URL: https://issues.apache.org/jira/browse/HDFS-14564 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client, libhdfs, native Reporter: Sahil Takiar Assignee: Sahil Takiar Splitting this out from HDFS-14478 The {{PositionedReadable#readFully}} APIs have existed for a while, but have never been exposed via libhdfs. HDFS-3246 added a new interface called {{ByteBufferPositionedReadable}} that provides a {{ByteBuffer}} version of {{PositionedReadable}}, but it does not contain a {{readFully}} method. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14478) Add libhdfs APIs for openFile
[ https://issues.apache.org/jira/browse/HDFS-14478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HDFS-14478: Summary: Add libhdfs APIs for openFile (was: Add libhdfs APIs for readFully and openFile) > Add libhdfs APIs for openFile > - > > Key: HDFS-14478 > URL: https://issues.apache.org/jira/browse/HDFS-14478 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, libhdfs, native >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > HADOOP-15229 added a "FileSystem builder-based openFile() API" that allows > specifying configuration values for opening files (similar to HADOOP-14365). > The {{PositionedReadable#readFully}} APIs have always existed. > Both of these APIs should be exposed by libhdfs. > Adding support for {{readFully}} should be straight-forward (a new libhdfs > API called {{hdfsPreadFully}} whose implementation is similar to > {{hdfsPread}}). > Support for {{openFile}} will be a little tricker as it is asynchronous and > {{FutureDataInputStreamBuilder#build}} returns a {{CompletableFuture}}. > At a high level, the API for {{openFile}} could look something like this: > {code:java} > LIBHDFS_EXTERNAL > struct hdfsOpenFileBuilder *hdfsOpenFileBuilderAlloc(hdfsFS fs, > const char *path, int flags); > LIBHDFS_EXTERNAL > void hdfsOpenFileBuilderFree(struct hdfsOpenFileBuilder *bld); > LIBHDFS_EXTERNAL > struct hdfsOpenFileBuilder *hdfsOpenFileBuilderMust(hdfsFS fs, > const char *key, const char *value); > LIBHDFS_EXTERNAL > struct hdfsOpenFileBuilder *hdfsOpenFileBuilderOpt(hdfsFS fs, > const char *key, const char *value); > LIBHDFS_EXTERNAL > hdfsFileFuture *hdfsOpenFileBuilderBuild(struct hdfsOpenFileBuilder *bld); > LIBHDFS_EXTERNAL > void hdfsOpenFileFutureFree(struct hdfsFileFuture *future); > LIBHDFS_EXTERNAL > hdfsFile hdfsOpenFileFutureGet(struct hdfsFileFuture *future); > LIBHDFS_EXTERNAL > hdfsFile hdfsOpenFileFutureGet(struct hdfsFileFuture *future, int64_t > timeout, const char* timeUnit); > LIBHDFS_EXTERNAL > void hdfsOpenFileFutureCancel(struct hdfsFileFuture *future, > bool mayInterruptIfRunning); > {code} > Instead of exposing all the functionality of {{CompleteableFuture}} libhdfs > would just expose the functionality of {{Future}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14482) Crash when using libhdfs with bad classpath
[ https://issues.apache.org/jira/browse/HDFS-14482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838831#comment-16838831 ] Sahil Takiar commented on HDFS-14482: - [~tlipcon] thanks for catching this. Opened a PR: https://github.com/apache/hadoop/pull/816 > Crash when using libhdfs with bad classpath > --- > > Key: HDFS-14482 > URL: https://issues.apache.org/jira/browse/HDFS-14482 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Todd Lipcon >Assignee: Sahil Takiar >Priority: Major > > HDFS-14304 added a call to initCachedClasses in getJNIEnv after creating the > env but before checking whether it's null. In the case that getJNIEnv() fails > to create an env, it returns NULL, and then we crash when calling > initCachedClasses() on line 555 > {code} > 551 state->env = getGlobalJNIEnv(); > 552 mutexUnlock(); > 553 > 554 jthrowable jthr = NULL; > 555 jthr = initCachedClasses(state->env); > 556 if (jthr) { > 557 printExceptionAndFree(state->env, jthr, PRINT_EXC_ALL, > 558 "initCachedClasses failed"); > 559 goto fail; > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-3246) pRead equivalent for direct read path
[ https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835622#comment-16835622 ] Sahil Takiar commented on HDFS-3246: No objections on my end. > pRead equivalent for direct read path > - > > Key: HDFS-3246 > URL: https://issues.apache.org/jira/browse/HDFS-3246 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, performance >Affects Versions: 3.0.0-alpha1 >Reporter: Henry Robinson >Assignee: Sahil Takiar >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, > HDFS-3246.003.patch, HDFS-3246.004.patch, HDFS-3246.005.patch, > HDFS-3246.006.patch, HDFS-3246.007.patch > > > There is no pread equivalent in ByteBufferReadable. We should consider adding > one. It would be relatively easy to implement for the distributed case > (certainly compared to HDFS-2834), since DFSInputStream does most of the > heavy lifting. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14478) Add libhdfs APIs for readFully and openFile
Sahil Takiar created HDFS-14478: --- Summary: Add libhdfs APIs for readFully and openFile Key: HDFS-14478 URL: https://issues.apache.org/jira/browse/HDFS-14478 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client, libhdfs, native Reporter: Sahil Takiar Assignee: Sahil Takiar HADOOP-15229 added a "FileSystem builder-based openFile() API" that allows specifying configuration values for opening files (similar to HADOOP-14365). The {{PositionedReadable#readFully}} APIs have always existed. Both of these APIs should be exposed by libhdfs. Adding support for {{readFully}} should be straight-forward (a new libhdfs API called {{hdfsPreadFully}} whose implementation is similar to {{hdfsPread}}). Support for {{openFile}} will be a little tricker as it is asynchronous and {{FutureDataInputStreamBuilder#build}} returns a {{CompletableFuture}}. At a high level, the API for {{openFile}} could look something like this: {code:java} LIBHDFS_EXTERNAL struct hdfsOpenFileBuilder *hdfsOpenFileBuilderAlloc(hdfsFS fs, const char *path, int flags); LIBHDFS_EXTERNAL void hdfsOpenFileBuilderFree(struct hdfsOpenFileBuilder *bld); LIBHDFS_EXTERNAL struct hdfsOpenFileBuilder *hdfsOpenFileBuilderMust(hdfsFS fs, const char *key, const char *value); LIBHDFS_EXTERNAL struct hdfsOpenFileBuilder *hdfsOpenFileBuilderOpt(hdfsFS fs, const char *key, const char *value); LIBHDFS_EXTERNAL hdfsFileFuture *hdfsOpenFileBuilderBuild(struct hdfsOpenFileBuilder *bld); LIBHDFS_EXTERNAL void hdfsOpenFileFutureFree(struct hdfsFileFuture *future); LIBHDFS_EXTERNAL hdfsFile hdfsOpenFileFutureGet(struct hdfsFileFuture *future); LIBHDFS_EXTERNAL hdfsFile hdfsOpenFileFutureGet(struct hdfsFileFuture *future, int64_t timeout, const char* timeUnit); LIBHDFS_EXTERNAL void hdfsOpenFileFutureCancel(struct hdfsFileFuture *future, bool mayInterruptIfRunning); {code} Instead of exposing all the functionality of {{CompleteableFuture}} libhdfs would just expose the functionality of {{Future}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-3246) pRead equivalent for direct read path
[ https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HDFS-3246: --- Resolution: Fixed Fix Version/s: 3.3.0 Status: Resolved (was: Patch Available) > pRead equivalent for direct read path > - > > Key: HDFS-3246 > URL: https://issues.apache.org/jira/browse/HDFS-3246 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, performance >Affects Versions: 3.0.0-alpha1 >Reporter: Henry Robinson >Assignee: Sahil Takiar >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, > HDFS-3246.003.patch, HDFS-3246.004.patch, HDFS-3246.005.patch, > HDFS-3246.006.patch, HDFS-3246.007.patch > > > There is no pread equivalent in ByteBufferReadable. We should consider adding > one. It would be relatively easy to implement for the distributed case > (certainly compared to HDFS-2834), since DFSInputStream does most of the > heavy lifting. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14417) CryptoInputStream decrypt methods can avoid usage of intermediate buffers
Sahil Takiar created HDFS-14417: --- Summary: CryptoInputStream decrypt methods can avoid usage of intermediate buffers Key: HDFS-14417 URL: https://issues.apache.org/jira/browse/HDFS-14417 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Reporter: Sahil Takiar Filing this as a follow up to [this|https://github.com/apache/hadoop/pull/597#discussion_r269399800] review comment on HDFS-3246. In {{CryptoInputStream}} the {{decrypt}} methods rely on temporary buffers to help decrypt the given encrypted data. The {{decrypt}} methods work by copying the input data chunk by chunk into an "input" buffer and then passing the "input" buffer and an "output" buffer to a decryption method that reads from the "input" buffer and writes to the "output" buffer. The contents of the "output" buffer are then copied back into the user buffer. Then the method moves onto the next chunk of the user buffer. Instead of copying all the data between these buffers, we should be able to decrypt them in-place - e.g. the input and output buffer are the same. As the comment points out, OpenSSL supports this. At the very least, we should be able to remove the usage of the "output" buffer and just pass the user buffer directly to the decryption classes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14394) Add -std=c99 / -std=gnu99 to libhdfs compile flags
[ https://issues.apache.org/jira/browse/HDFS-14394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808184#comment-16808184 ] Sahil Takiar commented on HDFS-14394: - Thanks for the input Todd. One last thing I forgot to mention. Hadoop QA didn't run the libhdfs tests for whatever reason. I ran then manually against this patch and they all passed. For anyone else having trouble getting the tests to reliably work on Linux, I was only able to get them to work properly while inside the Hadoop Docker image (run {{./start-build-env.sh}}). > Add -std=c99 / -std=gnu99 to libhdfs compile flags > -- > > Key: HDFS-14394 > URL: https://issues.apache.org/jira/browse/HDFS-14394 > Project: Hadoop HDFS > Issue Type: Task > Components: hdfs-client, libhdfs, native >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Attachments: HDFS-14394.001.patch > > > libhdfs compilation currently does not enforce a minimum required C version. > As of today, the libhdfs build on Hadoop QA works, but when built on a > machine with an outdated gcc / cc version where C89 is the default, > compilation fails due to errors such as: > {code} > /build/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jclasses.c:106:5: > error: ‘for’ loop initial declarations are only allowed in C99 mode > for (int i = 0; i < numCachedClasses; i++) { > ^ > /build/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jclasses.c:106:5: > note: use option -std=c99 or -std=gnu99 to compile your code > {code} > We should add the -std=c99 / -std=gnu99 flags to libhdfs compilation so that > we can enforce C99 as the minimum required version. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14394) Add -std=c99 / -std=gnu99 to libhdfs compile flags
[ https://issues.apache.org/jira/browse/HDFS-14394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808067#comment-16808067 ] Sahil Takiar commented on HDFS-14394: - I can add the {{-fextended-identifiers}} flag without any issues. I added {{-pedantic-errors}} and there are ton of warnings that have now become errors (including errors from unit tests and third party libraries). As you pointed out, I don't see a good way of applying {{CMAKE_C_FLAGS}} to specific projects, but if someone has a smart way of doing so I'm open to at least fixing the errors in libhdfs. > Add -std=c99 / -std=gnu99 to libhdfs compile flags > -- > > Key: HDFS-14394 > URL: https://issues.apache.org/jira/browse/HDFS-14394 > Project: Hadoop HDFS > Issue Type: Task > Components: hdfs-client, libhdfs, native >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Attachments: HDFS-14394.001.patch > > > libhdfs compilation currently does not enforce a minimum required C version. > As of today, the libhdfs build on Hadoop QA works, but when built on a > machine with an outdated gcc / cc version where C89 is the default, > compilation fails due to errors such as: > {code} > /build/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jclasses.c:106:5: > error: ‘for’ loop initial declarations are only allowed in C99 mode > for (int i = 0; i < numCachedClasses; i++) { > ^ > /build/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jclasses.c:106:5: > note: use option -std=c99 or -std=gnu99 to compile your code > {code} > We should add the -std=c99 / -std=gnu99 flags to libhdfs compilation so that > we can enforce C99 as the minimum required version. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14394) Add -std=c99 / -std=gnu99 to libhdfs compile flags
[ https://issues.apache.org/jira/browse/HDFS-14394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807365#comment-16807365 ] Sahil Takiar commented on HDFS-14394: - [~eyang] do those tests pass locally on trunk without the patch posted here? I've seen some errors like that before and it usually is environmental. Hadoop QA was able to build the patch, but I don't think it ran the libhdfs++ tests. I'm not sure I understand that link, but I don't think adding {{-std=c99}} to {{CMAKE_C_FLAGS}} forces all libhdfs++ code to be compiled using C99. My understanding is that {{CMAKE_C_FLAGS}} is for compiling C files and {{CMAKE_CXX_FLAGS}} is for compiling C++ files. The libhdfs++ CMake file already includes {{-std=c++11}}. There are a few of C files in libhdfs++, but most seem to be testing related. > Add -std=c99 / -std=gnu99 to libhdfs compile flags > -- > > Key: HDFS-14394 > URL: https://issues.apache.org/jira/browse/HDFS-14394 > Project: Hadoop HDFS > Issue Type: Task > Components: hdfs-client, libhdfs, native >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Attachments: HDFS-14394.001.patch > > > libhdfs compilation currently does not enforce a minimum required C version. > As of today, the libhdfs build on Hadoop QA works, but when built on a > machine with an outdated gcc / cc version where C89 is the default, > compilation fails due to errors such as: > {code} > /build/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jclasses.c:106:5: > error: ‘for’ loop initial declarations are only allowed in C99 mode > for (int i = 0; i < numCachedClasses; i++) { > ^ > /build/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jclasses.c:106:5: > note: use option -std=c99 or -std=gnu99 to compile your code > {code} > We should add the -std=c99 / -std=gnu99 flags to libhdfs compilation so that > we can enforce C99 as the minimum required version. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-3246) pRead equivalent for direct read path
[ https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HDFS-3246: --- Status: Patch Available (was: Open) > pRead equivalent for direct read path > - > > Key: HDFS-3246 > URL: https://issues.apache.org/jira/browse/HDFS-3246 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, performance >Affects Versions: 3.0.0-alpha1 >Reporter: Henry Robinson >Assignee: Sahil Takiar >Priority: Major > Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, > HDFS-3246.003.patch, HDFS-3246.004.patch, HDFS-3246.005.patch, > HDFS-3246.006.patch, HDFS-3246.007.patch > > > There is no pread equivalent in ByteBufferReadable. We should consider adding > one. It would be relatively easy to implement for the distributed case > (certainly compared to HDFS-2834), since DFSInputStream does most of the > heavy lifting. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-3246) pRead equivalent for direct read path
[ https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HDFS-3246: --- Status: Open (was: Patch Available) > pRead equivalent for direct read path > - > > Key: HDFS-3246 > URL: https://issues.apache.org/jira/browse/HDFS-3246 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, performance >Affects Versions: 3.0.0-alpha1 >Reporter: Henry Robinson >Assignee: Sahil Takiar >Priority: Major > Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, > HDFS-3246.003.patch, HDFS-3246.004.patch, HDFS-3246.005.patch, > HDFS-3246.006.patch, HDFS-3246.007.patch > > > There is no pread equivalent in ByteBufferReadable. We should consider adding > one. It would be relatively easy to implement for the distributed case > (certainly compared to HDFS-2834), since DFSInputStream does most of the > heavy lifting. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14394) Add -std=c99 / -std=gnu99 to libhdfs compile flags
[ https://issues.apache.org/jira/browse/HDFS-14394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807272#comment-16807272 ] Sahil Takiar commented on HDFS-14394: - [~eyang] what was the compilation error that you hit? I double checked and I'm able to compile the patch locally. Is there a specific reason we don't want the {{-std=c99}} flag in libhdfs++? Would moving the C_FLAGS setting to {{src/main/native/libhdfs/CMakeLists.txt}} fix this? I'm not sure what the policy is on using {{-std=c99}} or {{-std=gnu99}}, although we were already using {{gnu99}} for Solaris builds. I don't think there is a specific reason we need {{gnu99}} over {{c99}} though. > Add -std=c99 / -std=gnu99 to libhdfs compile flags > -- > > Key: HDFS-14394 > URL: https://issues.apache.org/jira/browse/HDFS-14394 > Project: Hadoop HDFS > Issue Type: Task > Components: hdfs-client, libhdfs, native >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Attachments: HDFS-14394.001.patch > > > libhdfs compilation currently does not enforce a minimum required C version. > As of today, the libhdfs build on Hadoop QA works, but when built on a > machine with an outdated gcc / cc version where C89 is the default, > compilation fails due to errors such as: > {code} > /build/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jclasses.c:106:5: > error: ‘for’ loop initial declarations are only allowed in C99 mode > for (int i = 0; i < numCachedClasses; i++) { > ^ > /build/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jclasses.c:106:5: > note: use option -std=c99 or -std=gnu99 to compile your code > {code} > We should add the -std=c99 / -std=gnu99 flags to libhdfs compilation so that > we can enforce C99 as the minimum required version. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14394) Add -std=c99 / -std=gnu99 to libhdfs compile flags
[ https://issues.apache.org/jira/browse/HDFS-14394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807214#comment-16807214 ] Sahil Takiar commented on HDFS-14394: - Thanks for the input [~jojochuang], [~tlipcon]. Posted a patch that adds {{set(CMAKE_C_FLAGS "-std=gnu99 ${CMAKE_C_FLAGS}")}} to {{HadoopCommon.cmake}}. It looks like we already do this for Solaris builds. [~eyang] not entirely sure if I follow your reasoning for splitting {{hadoop-hdfs-native-client}} into several sub-projects, could you expand a bit more? Are you referring to [this|https://github.com/cmake-maven-project/cmake-maven-project] Maven plugin? It certainly looks interesting. However, both of these changes look like relatively large projects that are probably out of the scope of what this JIRA is trying to address. > Add -std=c99 / -std=gnu99 to libhdfs compile flags > -- > > Key: HDFS-14394 > URL: https://issues.apache.org/jira/browse/HDFS-14394 > Project: Hadoop HDFS > Issue Type: Task > Components: hdfs-client, libhdfs, native >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Attachments: HDFS-14394.001.patch > > > libhdfs compilation currently does not enforce a minimum required C version. > As of today, the libhdfs build on Hadoop QA works, but when built on a > machine with an outdated gcc / cc version where C89 is the default, > compilation fails due to errors such as: > {code} > /build/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jclasses.c:106:5: > error: ‘for’ loop initial declarations are only allowed in C99 mode > for (int i = 0; i < numCachedClasses; i++) { > ^ > /build/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jclasses.c:106:5: > note: use option -std=c99 or -std=gnu99 to compile your code > {code} > We should add the -std=c99 / -std=gnu99 flags to libhdfs compilation so that > we can enforce C99 as the minimum required version. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14394) Add -std=c99 / -std=gnu99 to libhdfs compile flags
[ https://issues.apache.org/jira/browse/HDFS-14394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HDFS-14394: Attachment: HDFS-14394.001.patch > Add -std=c99 / -std=gnu99 to libhdfs compile flags > -- > > Key: HDFS-14394 > URL: https://issues.apache.org/jira/browse/HDFS-14394 > Project: Hadoop HDFS > Issue Type: Task > Components: hdfs-client, libhdfs, native >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Attachments: HDFS-14394.001.patch > > > libhdfs compilation currently does not enforce a minimum required C version. > As of today, the libhdfs build on Hadoop QA works, but when built on a > machine with an outdated gcc / cc version where C89 is the default, > compilation fails due to errors such as: > {code} > /build/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jclasses.c:106:5: > error: ‘for’ loop initial declarations are only allowed in C99 mode > for (int i = 0; i < numCachedClasses; i++) { > ^ > /build/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jclasses.c:106:5: > note: use option -std=c99 or -std=gnu99 to compile your code > {code} > We should add the -std=c99 / -std=gnu99 flags to libhdfs compilation so that > we can enforce C99 as the minimum required version. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14394) Add -std=c99 / -std=gnu99 to libhdfs compile flags
[ https://issues.apache.org/jira/browse/HDFS-14394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HDFS-14394: Status: Patch Available (was: Open) > Add -std=c99 / -std=gnu99 to libhdfs compile flags > -- > > Key: HDFS-14394 > URL: https://issues.apache.org/jira/browse/HDFS-14394 > Project: Hadoop HDFS > Issue Type: Task > Components: hdfs-client, libhdfs, native >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Attachments: HDFS-14394.001.patch > > > libhdfs compilation currently does not enforce a minimum required C version. > As of today, the libhdfs build on Hadoop QA works, but when built on a > machine with an outdated gcc / cc version where C89 is the default, > compilation fails due to errors such as: > {code} > /build/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jclasses.c:106:5: > error: ‘for’ loop initial declarations are only allowed in C99 mode > for (int i = 0; i < numCachedClasses; i++) { > ^ > /build/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jclasses.c:106:5: > note: use option -std=c99 or -std=gnu99 to compile your code > {code} > We should add the -std=c99 / -std=gnu99 flags to libhdfs compilation so that > we can enforce C99 as the minimum required version. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14394) Add -std=c99 / -std=gnu99 to libhdfs compile flags
Sahil Takiar created HDFS-14394: --- Summary: Add -std=c99 / -std=gnu99 to libhdfs compile flags Key: HDFS-14394 URL: https://issues.apache.org/jira/browse/HDFS-14394 Project: Hadoop HDFS Issue Type: Task Components: hdfs-client, libhdfs, native Reporter: Sahil Takiar Assignee: Sahil Takiar libhdfs compilation currently does not enforce a minimum required C version. As of today, the libhdfs build on Hadoop QA works, but when built on a machine with an outdated gcc / cc version where C89 is the default, compilation fails due to errors such as: {code} /build/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jclasses.c:106:5: error: ‘for’ loop initial declarations are only allowed in C99 mode for (int i = 0; i < numCachedClasses; i++) { ^ /build/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jclasses.c:106:5: note: use option -std=c99 or -std=gnu99 to compile your code {code} We should add the -std=c99 / -std=gnu99 flags to libhdfs compilation so that we can enforce C99 as the minimum required version. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14304) High lock contention on hdfsHashMutex in libhdfs
[ https://issues.apache.org/jira/browse/HDFS-14304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HDFS-14304: Status: Open (was: Patch Available) > High lock contention on hdfsHashMutex in libhdfs > > > Key: HDFS-14304 > URL: https://issues.apache.org/jira/browse/HDFS-14304 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, libhdfs, native >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > While doing some performance profiling of an application using libhdfs, we > noticed a high amount of lock contention on the {{hdfsHashMutex}} defined in > {{hadoop-hdfs-native-client/src/main/native/libhdfs/os/mutexes.h}} > The issue is that every JNI method invocation done by {{hdfs.c}} goes through > a helper method called {{invokeMethod}}. {{invokeMethod}} calls > {{globalClassReference}} which acquires {{hdfsHashMutex}} while performing a > lookup in a {{htable}} (a custom hash table that lives in {{libhdfs/common}}) > (the lock is acquired for both reads and writes). The hash table maps {{char > *className}} to {{jclass}} objects, it seems the goal of the hash table is to > avoid repeatedly creating {{jclass}} objects for each JNI call. > For multi-threaded applications, this lock severely limits that rate at which > Java methods can be invoked. pstacks show a lot of time being spent on > {{hdfsHashMutex}} > {code:java} > #0 0x7fba2dbc242d in __lll_lock_wait () from /lib64/libpthread.so.0 > #1 0x7fba2dbbddcb in _L_lock_812 () from /lib64/libpthread.so.0 > #2 0x7fba2dbbdc98 in pthread_mutex_lock () from /lib64/libpthread.so.0 > #3 0x027d8386 in mutexLock () > #4 0x027d0e7b in globalClassReference () > #5 0x027d1160 in invokeMethod () > #6 0x027d4176 in readDirect () > #7 0x027d4325 in hdfsRead () > {code} > Same with {{perf report}} > {code:java} > + 63.36% 0.01% [k] system_call_fastpath > + 61.60% 0.12% [k] sys_futex > + 61.45% 0.13% [k] do_futex > + 57.54% 0.49% [k] _raw_qspin_lock > + 57.07% 0.01% [k] queued_spin_lock_slowpath > + 55.47%55.47% [k] native_queued_spin_lock_slowpath > - 35.68% 0.00% [k] 0x6f6f6461682f6568 >- 0x6f6f6461682f6568 > - 30.55% __lll_lock_wait > - 29.40% system_call_fastpath > - 29.39% sys_futex >- 29.35% do_futex > - 29.27% futex_wait > - 28.17% futex_wait_setup > - 27.05% _raw_qspin_lock >- 27.05% queued_spin_lock_slowpath > 26.30% native_queued_spin_lock_slowpath > + 0.67% ret_from_intr > + 0.71% futex_wait_queue_me > - 2.00% methodIdFromClass > - 1.94% jni_GetMethodID > - 1.71% get_method_id > 0.96% SymbolTable::lookup_only > - 1.61% invokeMethod > - 0.62% jni_CallLongMethodV > 0.52% jni_invoke_nonstatic > 0.75% pthread_mutex_lock > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14304) High lock contention on hdfsHashMutex in libhdfs
[ https://issues.apache.org/jira/browse/HDFS-14304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HDFS-14304: Status: Patch Available (was: Open) > High lock contention on hdfsHashMutex in libhdfs > > > Key: HDFS-14304 > URL: https://issues.apache.org/jira/browse/HDFS-14304 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, libhdfs, native >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > While doing some performance profiling of an application using libhdfs, we > noticed a high amount of lock contention on the {{hdfsHashMutex}} defined in > {{hadoop-hdfs-native-client/src/main/native/libhdfs/os/mutexes.h}} > The issue is that every JNI method invocation done by {{hdfs.c}} goes through > a helper method called {{invokeMethod}}. {{invokeMethod}} calls > {{globalClassReference}} which acquires {{hdfsHashMutex}} while performing a > lookup in a {{htable}} (a custom hash table that lives in {{libhdfs/common}}) > (the lock is acquired for both reads and writes). The hash table maps {{char > *className}} to {{jclass}} objects, it seems the goal of the hash table is to > avoid repeatedly creating {{jclass}} objects for each JNI call. > For multi-threaded applications, this lock severely limits that rate at which > Java methods can be invoked. pstacks show a lot of time being spent on > {{hdfsHashMutex}} > {code:java} > #0 0x7fba2dbc242d in __lll_lock_wait () from /lib64/libpthread.so.0 > #1 0x7fba2dbbddcb in _L_lock_812 () from /lib64/libpthread.so.0 > #2 0x7fba2dbbdc98 in pthread_mutex_lock () from /lib64/libpthread.so.0 > #3 0x027d8386 in mutexLock () > #4 0x027d0e7b in globalClassReference () > #5 0x027d1160 in invokeMethod () > #6 0x027d4176 in readDirect () > #7 0x027d4325 in hdfsRead () > {code} > Same with {{perf report}} > {code:java} > + 63.36% 0.01% [k] system_call_fastpath > + 61.60% 0.12% [k] sys_futex > + 61.45% 0.13% [k] do_futex > + 57.54% 0.49% [k] _raw_qspin_lock > + 57.07% 0.01% [k] queued_spin_lock_slowpath > + 55.47%55.47% [k] native_queued_spin_lock_slowpath > - 35.68% 0.00% [k] 0x6f6f6461682f6568 >- 0x6f6f6461682f6568 > - 30.55% __lll_lock_wait > - 29.40% system_call_fastpath > - 29.39% sys_futex >- 29.35% do_futex > - 29.27% futex_wait > - 28.17% futex_wait_setup > - 27.05% _raw_qspin_lock >- 27.05% queued_spin_lock_slowpath > 26.30% native_queued_spin_lock_slowpath > + 0.67% ret_from_intr > + 0.71% futex_wait_queue_me > - 2.00% methodIdFromClass > - 1.94% jni_GetMethodID > - 1.71% get_method_id > 0.96% SymbolTable::lookup_only > - 1.61% invokeMethod > - 0.62% jni_CallLongMethodV > 0.52% jni_invoke_nonstatic > 0.75% pthread_mutex_lock > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14304) High lock contention on hdfsHashMutex in libhdfs
[ https://issues.apache.org/jira/browse/HDFS-14304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HDFS-14304: Status: Patch Available (was: Open) > High lock contention on hdfsHashMutex in libhdfs > > > Key: HDFS-14304 > URL: https://issues.apache.org/jira/browse/HDFS-14304 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, libhdfs, native >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > While doing some performance profiling of an application using libhdfs, we > noticed a high amount of lock contention on the {{hdfsHashMutex}} defined in > {{hadoop-hdfs-native-client/src/main/native/libhdfs/os/mutexes.h}} > The issue is that every JNI method invocation done by {{hdfs.c}} goes through > a helper method called {{invokeMethod}}. {{invokeMethod}} calls > {{globalClassReference}} which acquires {{hdfsHashMutex}} while performing a > lookup in a {{htable}} (a custom hash table that lives in {{libhdfs/common}}) > (the lock is acquired for both reads and writes). The hash table maps {{char > *className}} to {{jclass}} objects, it seems the goal of the hash table is to > avoid repeatedly creating {{jclass}} objects for each JNI call. > For multi-threaded applications, this lock severely limits that rate at which > Java methods can be invoked. pstacks show a lot of time being spent on > {{hdfsHashMutex}} > {code:java} > #0 0x7fba2dbc242d in __lll_lock_wait () from /lib64/libpthread.so.0 > #1 0x7fba2dbbddcb in _L_lock_812 () from /lib64/libpthread.so.0 > #2 0x7fba2dbbdc98 in pthread_mutex_lock () from /lib64/libpthread.so.0 > #3 0x027d8386 in mutexLock () > #4 0x027d0e7b in globalClassReference () > #5 0x027d1160 in invokeMethod () > #6 0x027d4176 in readDirect () > #7 0x027d4325 in hdfsRead () > {code} > Same with {{perf report}} > {code:java} > + 63.36% 0.01% [k] system_call_fastpath > + 61.60% 0.12% [k] sys_futex > + 61.45% 0.13% [k] do_futex > + 57.54% 0.49% [k] _raw_qspin_lock > + 57.07% 0.01% [k] queued_spin_lock_slowpath > + 55.47%55.47% [k] native_queued_spin_lock_slowpath > - 35.68% 0.00% [k] 0x6f6f6461682f6568 >- 0x6f6f6461682f6568 > - 30.55% __lll_lock_wait > - 29.40% system_call_fastpath > - 29.39% sys_futex >- 29.35% do_futex > - 29.27% futex_wait > - 28.17% futex_wait_setup > - 27.05% _raw_qspin_lock >- 27.05% queued_spin_lock_slowpath > 26.30% native_queued_spin_lock_slowpath > + 0.67% ret_from_intr > + 0.71% futex_wait_queue_me > - 2.00% methodIdFromClass > - 1.94% jni_GetMethodID > - 1.71% get_method_id > 0.96% SymbolTable::lookup_only > - 1.61% invokeMethod > - 0.62% jni_CallLongMethodV > 0.52% jni_invoke_nonstatic > 0.75% pthread_mutex_lock > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14304) High lock contention on hdfsHashMutex in libhdfs
[ https://issues.apache.org/jira/browse/HDFS-14304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HDFS-14304: Status: Open (was: Patch Available) > High lock contention on hdfsHashMutex in libhdfs > > > Key: HDFS-14304 > URL: https://issues.apache.org/jira/browse/HDFS-14304 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, libhdfs, native >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > While doing some performance profiling of an application using libhdfs, we > noticed a high amount of lock contention on the {{hdfsHashMutex}} defined in > {{hadoop-hdfs-native-client/src/main/native/libhdfs/os/mutexes.h}} > The issue is that every JNI method invocation done by {{hdfs.c}} goes through > a helper method called {{invokeMethod}}. {{invokeMethod}} calls > {{globalClassReference}} which acquires {{hdfsHashMutex}} while performing a > lookup in a {{htable}} (a custom hash table that lives in {{libhdfs/common}}) > (the lock is acquired for both reads and writes). The hash table maps {{char > *className}} to {{jclass}} objects, it seems the goal of the hash table is to > avoid repeatedly creating {{jclass}} objects for each JNI call. > For multi-threaded applications, this lock severely limits that rate at which > Java methods can be invoked. pstacks show a lot of time being spent on > {{hdfsHashMutex}} > {code:java} > #0 0x7fba2dbc242d in __lll_lock_wait () from /lib64/libpthread.so.0 > #1 0x7fba2dbbddcb in _L_lock_812 () from /lib64/libpthread.so.0 > #2 0x7fba2dbbdc98 in pthread_mutex_lock () from /lib64/libpthread.so.0 > #3 0x027d8386 in mutexLock () > #4 0x027d0e7b in globalClassReference () > #5 0x027d1160 in invokeMethod () > #6 0x027d4176 in readDirect () > #7 0x027d4325 in hdfsRead () > {code} > Same with {{perf report}} > {code:java} > + 63.36% 0.01% [k] system_call_fastpath > + 61.60% 0.12% [k] sys_futex > + 61.45% 0.13% [k] do_futex > + 57.54% 0.49% [k] _raw_qspin_lock > + 57.07% 0.01% [k] queued_spin_lock_slowpath > + 55.47%55.47% [k] native_queued_spin_lock_slowpath > - 35.68% 0.00% [k] 0x6f6f6461682f6568 >- 0x6f6f6461682f6568 > - 30.55% __lll_lock_wait > - 29.40% system_call_fastpath > - 29.39% sys_futex >- 29.35% do_futex > - 29.27% futex_wait > - 28.17% futex_wait_setup > - 27.05% _raw_qspin_lock >- 27.05% queued_spin_lock_slowpath > 26.30% native_queued_spin_lock_slowpath > + 0.67% ret_from_intr > + 0.71% futex_wait_queue_me > - 2.00% methodIdFromClass > - 1.94% jni_GetMethodID > - 1.71% get_method_id > 0.96% SymbolTable::lookup_only > - 1.61% invokeMethod > - 0.62% jni_CallLongMethodV > 0.52% jni_invoke_nonstatic > 0.75% pthread_mutex_lock > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-3246) pRead equivalent for direct read path
[ https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16798571#comment-16798571 ] Sahil Takiar commented on HDFS-3246: [~daryn] done. > pRead equivalent for direct read path > - > > Key: HDFS-3246 > URL: https://issues.apache.org/jira/browse/HDFS-3246 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, performance >Affects Versions: 3.0.0-alpha1 >Reporter: Henry Robinson >Assignee: Sahil Takiar >Priority: Major > Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, > HDFS-3246.003.patch, HDFS-3246.004.patch, HDFS-3246.005.patch, > HDFS-3246.006.patch, HDFS-3246.007.patch > > > There is no pread equivalent in ByteBufferReadable. We should consider adding > one. It would be relatively easy to implement for the distributed case > (certainly compared to HDFS-2834), since DFSInputStream does most of the > heavy lifting. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-3246) pRead equivalent for direct read path
[ https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HDFS-3246: --- Status: Open (was: Patch Available) > pRead equivalent for direct read path > - > > Key: HDFS-3246 > URL: https://issues.apache.org/jira/browse/HDFS-3246 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, performance >Affects Versions: 3.0.0-alpha1 >Reporter: Henry Robinson >Assignee: Sahil Takiar >Priority: Major > Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, > HDFS-3246.003.patch, HDFS-3246.004.patch, HDFS-3246.005.patch, > HDFS-3246.006.patch, HDFS-3246.007.patch > > > There is no pread equivalent in ByteBufferReadable. We should consider adding > one. It would be relatively easy to implement for the distributed case > (certainly compared to HDFS-2834), since DFSInputStream does most of the > heavy lifting. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-3246) pRead equivalent for direct read path
[ https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HDFS-3246: --- Status: Patch Available (was: Open) > pRead equivalent for direct read path > - > > Key: HDFS-3246 > URL: https://issues.apache.org/jira/browse/HDFS-3246 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, performance >Affects Versions: 3.0.0-alpha1 >Reporter: Henry Robinson >Assignee: Sahil Takiar >Priority: Major > Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, > HDFS-3246.003.patch, HDFS-3246.004.patch, HDFS-3246.005.patch, > HDFS-3246.006.patch, HDFS-3246.007.patch > > > There is no pread equivalent in ByteBufferReadable. We should consider adding > one. It would be relatively easy to implement for the distributed case > (certainly compared to HDFS-2834), since DFSInputStream does most of the > heavy lifting. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-3246) pRead equivalent for direct read path
[ https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HDFS-3246: --- Status: Open (was: Patch Available) > pRead equivalent for direct read path > - > > Key: HDFS-3246 > URL: https://issues.apache.org/jira/browse/HDFS-3246 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, performance >Affects Versions: 3.0.0-alpha1 >Reporter: Henry Robinson >Assignee: Sahil Takiar >Priority: Major > Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, > HDFS-3246.003.patch, HDFS-3246.004.patch, HDFS-3246.005.patch, > HDFS-3246.006.patch, HDFS-3246.007.patch > > > There is no pread equivalent in ByteBufferReadable. We should consider adding > one. It would be relatively easy to implement for the distributed case > (certainly compared to HDFS-2834), since DFSInputStream does most of the > heavy lifting. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-3246) pRead equivalent for direct read path
[ https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16798384#comment-16798384 ] Sahil Takiar commented on HDFS-3246: Thanks for taking a look [~openinx], addressed your comments and updated the PR. > pRead equivalent for direct read path > - > > Key: HDFS-3246 > URL: https://issues.apache.org/jira/browse/HDFS-3246 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, performance >Affects Versions: 3.0.0-alpha1 >Reporter: Henry Robinson >Assignee: Sahil Takiar >Priority: Major > Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, > HDFS-3246.003.patch, HDFS-3246.004.patch, HDFS-3246.005.patch, > HDFS-3246.006.patch, HDFS-3246.007.patch > > > There is no pread equivalent in ByteBufferReadable. We should consider adding > one. It would be relatively easy to implement for the distributed case > (certainly compared to HDFS-2834), since DFSInputStream does most of the > heavy lifting. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-3246) pRead equivalent for direct read path
[ https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HDFS-3246: --- Status: Patch Available (was: Open) > pRead equivalent for direct read path > - > > Key: HDFS-3246 > URL: https://issues.apache.org/jira/browse/HDFS-3246 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, performance >Affects Versions: 3.0.0-alpha1 >Reporter: Henry Robinson >Assignee: Sahil Takiar >Priority: Major > Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, > HDFS-3246.003.patch, HDFS-3246.004.patch, HDFS-3246.005.patch, > HDFS-3246.006.patch, HDFS-3246.007.patch > > > There is no pread equivalent in ByteBufferReadable. We should consider adding > one. It would be relatively easy to implement for the distributed case > (certainly compared to HDFS-2834), since DFSInputStream does most of the > heavy lifting. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14348) Fix JNI exception handling issues in libhdfs
[ https://issues.apache.org/jira/browse/HDFS-14348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HDFS-14348: Status: Open (was: Patch Available) > Fix JNI exception handling issues in libhdfs > > > Key: HDFS-14348 > URL: https://issues.apache.org/jira/browse/HDFS-14348 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client, libhdfs, native >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > During some manual digging through the libhdfs code, we found several places > where we are not handling exceptions properly. > Specifically, there seem to be some violation of the following snippet from > the JNI Oracle docs > (https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/design.html#exceptions_and_error_codes): > {quote} > *Exceptions and Error Codes* > Certain JNI functions use the Java exception mechanism to report error > conditions. In most cases, JNI functions report error conditions by returning > an error code and throwing a Java exception. The error code is usually a > special return value (such as NULL) that is outside of the range of normal > return values. Therefore, the programmer can quickly check the return value > of the last JNI call to determine if an error has occurred, and call a > function, ExceptionOccurred(), to obtain the exception object that contains a > more detailed description of the error condition. > There are two cases where the programmer needs to check for exceptions > without being able to first check an error code: > [1] The JNI functions that invoke a Java method return the result of the Java > method. The programmer must call ExceptionOccurred() to check for possible > exceptions that occurred during the execution of the Java method. > [2] Some of the JNI array access functions do not return an error code, but > may throw an ArrayIndexOutOfBoundsException or ArrayStoreException. > In all other cases, a non-error return value guarantees that no exceptions > have been thrown. > {quote} > Here is a running list of issues: > * {{classNameOfObject}} in {{jni_helper.c}} calls {{CallObjectMethod}} but > does not check if an exception has occurred, it only checks if the result of > the method (in this case {{Class#getName(String)}}) returns {{NULL}} > * Exception handling in {{get_current_thread_id}} (both > {{posix/thread_local_storage.c}} and {{windows/thread_local_storage.c}}) > seems to have several issues; lots of JNI methods are called without checking > for exceptions > * Most of the calls to {{GetObjectArrayElement}} and {{GetByteArrayRegion}} > in {{hdfs.c}} do not check for exceptions properly > ** e.g. for {{GetObjectArrayElement}} they only check if the result of the > operation is {{NULL}}, but they should call {{ExceptionOccurred}} to look for > pending exceptions as well -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14348) Fix JNI exception handling issues in libhdfs
[ https://issues.apache.org/jira/browse/HDFS-14348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HDFS-14348: Status: Patch Available (was: Open) > Fix JNI exception handling issues in libhdfs > > > Key: HDFS-14348 > URL: https://issues.apache.org/jira/browse/HDFS-14348 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client, libhdfs, native >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > During some manual digging through the libhdfs code, we found several places > where we are not handling exceptions properly. > Specifically, there seem to be some violation of the following snippet from > the JNI Oracle docs > (https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/design.html#exceptions_and_error_codes): > {quote} > *Exceptions and Error Codes* > Certain JNI functions use the Java exception mechanism to report error > conditions. In most cases, JNI functions report error conditions by returning > an error code and throwing a Java exception. The error code is usually a > special return value (such as NULL) that is outside of the range of normal > return values. Therefore, the programmer can quickly check the return value > of the last JNI call to determine if an error has occurred, and call a > function, ExceptionOccurred(), to obtain the exception object that contains a > more detailed description of the error condition. > There are two cases where the programmer needs to check for exceptions > without being able to first check an error code: > [1] The JNI functions that invoke a Java method return the result of the Java > method. The programmer must call ExceptionOccurred() to check for possible > exceptions that occurred during the execution of the Java method. > [2] Some of the JNI array access functions do not return an error code, but > may throw an ArrayIndexOutOfBoundsException or ArrayStoreException. > In all other cases, a non-error return value guarantees that no exceptions > have been thrown. > {quote} > Here is a running list of issues: > * {{classNameOfObject}} in {{jni_helper.c}} calls {{CallObjectMethod}} but > does not check if an exception has occurred, it only checks if the result of > the method (in this case {{Class#getName(String)}}) returns {{NULL}} > * Exception handling in {{get_current_thread_id}} (both > {{posix/thread_local_storage.c}} and {{windows/thread_local_storage.c}}) > seems to have several issues; lots of JNI methods are called without checking > for exceptions > * Most of the calls to {{GetObjectArrayElement}} and {{GetByteArrayRegion}} > in {{hdfs.c}} do not check for exceptions properly > ** e.g. for {{GetObjectArrayElement}} they only check if the result of the > operation is {{NULL}}, but they should call {{ExceptionOccurred}} to look for > pending exceptions as well -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14304) High lock contention on hdfsHashMutex in libhdfs
[ https://issues.apache.org/jira/browse/HDFS-14304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HDFS-14304: Status: Patch Available (was: Open) > High lock contention on hdfsHashMutex in libhdfs > > > Key: HDFS-14304 > URL: https://issues.apache.org/jira/browse/HDFS-14304 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, libhdfs, native >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > While doing some performance profiling of an application using libhdfs, we > noticed a high amount of lock contention on the {{hdfsHashMutex}} defined in > {{hadoop-hdfs-native-client/src/main/native/libhdfs/os/mutexes.h}} > The issue is that every JNI method invocation done by {{hdfs.c}} goes through > a helper method called {{invokeMethod}}. {{invokeMethod}} calls > {{globalClassReference}} which acquires {{hdfsHashMutex}} while performing a > lookup in a {{htable}} (a custom hash table that lives in {{libhdfs/common}}) > (the lock is acquired for both reads and writes). The hash table maps {{char > *className}} to {{jclass}} objects, it seems the goal of the hash table is to > avoid repeatedly creating {{jclass}} objects for each JNI call. > For multi-threaded applications, this lock severely limits that rate at which > Java methods can be invoked. pstacks show a lot of time being spent on > {{hdfsHashMutex}} > {code:java} > #0 0x7fba2dbc242d in __lll_lock_wait () from /lib64/libpthread.so.0 > #1 0x7fba2dbbddcb in _L_lock_812 () from /lib64/libpthread.so.0 > #2 0x7fba2dbbdc98 in pthread_mutex_lock () from /lib64/libpthread.so.0 > #3 0x027d8386 in mutexLock () > #4 0x027d0e7b in globalClassReference () > #5 0x027d1160 in invokeMethod () > #6 0x027d4176 in readDirect () > #7 0x027d4325 in hdfsRead () > {code} > Same with {{perf report}} > {code:java} > + 63.36% 0.01% [k] system_call_fastpath > + 61.60% 0.12% [k] sys_futex > + 61.45% 0.13% [k] do_futex > + 57.54% 0.49% [k] _raw_qspin_lock > + 57.07% 0.01% [k] queued_spin_lock_slowpath > + 55.47%55.47% [k] native_queued_spin_lock_slowpath > - 35.68% 0.00% [k] 0x6f6f6461682f6568 >- 0x6f6f6461682f6568 > - 30.55% __lll_lock_wait > - 29.40% system_call_fastpath > - 29.39% sys_futex >- 29.35% do_futex > - 29.27% futex_wait > - 28.17% futex_wait_setup > - 27.05% _raw_qspin_lock >- 27.05% queued_spin_lock_slowpath > 26.30% native_queued_spin_lock_slowpath > + 0.67% ret_from_intr > + 0.71% futex_wait_queue_me > - 2.00% methodIdFromClass > - 1.94% jni_GetMethodID > - 1.71% get_method_id > 0.96% SymbolTable::lookup_only > - 1.61% invokeMethod > - 0.62% jni_CallLongMethodV > 0.52% jni_invoke_nonstatic > 0.75% pthread_mutex_lock > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14304) High lock contention on hdfsHashMutex in libhdfs
[ https://issues.apache.org/jira/browse/HDFS-14304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HDFS-14304: Status: Open (was: Patch Available) > High lock contention on hdfsHashMutex in libhdfs > > > Key: HDFS-14304 > URL: https://issues.apache.org/jira/browse/HDFS-14304 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, libhdfs, native >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > While doing some performance profiling of an application using libhdfs, we > noticed a high amount of lock contention on the {{hdfsHashMutex}} defined in > {{hadoop-hdfs-native-client/src/main/native/libhdfs/os/mutexes.h}} > The issue is that every JNI method invocation done by {{hdfs.c}} goes through > a helper method called {{invokeMethod}}. {{invokeMethod}} calls > {{globalClassReference}} which acquires {{hdfsHashMutex}} while performing a > lookup in a {{htable}} (a custom hash table that lives in {{libhdfs/common}}) > (the lock is acquired for both reads and writes). The hash table maps {{char > *className}} to {{jclass}} objects, it seems the goal of the hash table is to > avoid repeatedly creating {{jclass}} objects for each JNI call. > For multi-threaded applications, this lock severely limits that rate at which > Java methods can be invoked. pstacks show a lot of time being spent on > {{hdfsHashMutex}} > {code:java} > #0 0x7fba2dbc242d in __lll_lock_wait () from /lib64/libpthread.so.0 > #1 0x7fba2dbbddcb in _L_lock_812 () from /lib64/libpthread.so.0 > #2 0x7fba2dbbdc98 in pthread_mutex_lock () from /lib64/libpthread.so.0 > #3 0x027d8386 in mutexLock () > #4 0x027d0e7b in globalClassReference () > #5 0x027d1160 in invokeMethod () > #6 0x027d4176 in readDirect () > #7 0x027d4325 in hdfsRead () > {code} > Same with {{perf report}} > {code:java} > + 63.36% 0.01% [k] system_call_fastpath > + 61.60% 0.12% [k] sys_futex > + 61.45% 0.13% [k] do_futex > + 57.54% 0.49% [k] _raw_qspin_lock > + 57.07% 0.01% [k] queued_spin_lock_slowpath > + 55.47%55.47% [k] native_queued_spin_lock_slowpath > - 35.68% 0.00% [k] 0x6f6f6461682f6568 >- 0x6f6f6461682f6568 > - 30.55% __lll_lock_wait > - 29.40% system_call_fastpath > - 29.39% sys_futex >- 29.35% do_futex > - 29.27% futex_wait > - 28.17% futex_wait_setup > - 27.05% _raw_qspin_lock >- 27.05% queued_spin_lock_slowpath > 26.30% native_queued_spin_lock_slowpath > + 0.67% ret_from_intr > + 0.71% futex_wait_queue_me > - 2.00% methodIdFromClass > - 1.94% jni_GetMethodID > - 1.71% get_method_id > 0.96% SymbolTable::lookup_only > - 1.61% invokeMethod > - 0.62% jni_CallLongMethodV > 0.52% jni_invoke_nonstatic > 0.75% pthread_mutex_lock > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14386) Improve libhdfs test coverage for failure paths
Sahil Takiar created HDFS-14386: --- Summary: Improve libhdfs test coverage for failure paths Key: HDFS-14386 URL: https://issues.apache.org/jira/browse/HDFS-14386 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client, libhdfs, native Reporter: Sahil Takiar Assignee: Sahil Takiar While working on HDFS-14304 and HDFS-14348, it seems that libhdfs does not have great test coverage for failure paths. We found a few places in libhdfs where we are not propagating / handling exceptions properly. The goal of this JIRA is to improve test coverage for the failure / exception handling code in libhdfs. I don't have a clear picture of how to do this, but here are some ideas: (1) Create a dummy {{FileSystem}} where all operations throw an {{Exception}} and call into that {{FileSystem}} using libhdfs. (2) We already do things like trying to open a file that does not exist, we can add tests that list a directory that does not exist, etc. (3) It would be great if we could use some type of method stubbing (like Mockito in Java) for JNI methods, so we could test that our usage of the JNI is correct - e.g. if {{NewByteArray}} returns {{NULL}} do we actually throw an exception? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14348) Fix JNI exception handling issues in libhdfs
[ https://issues.apache.org/jira/browse/HDFS-14348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16798228#comment-16798228 ] Sahil Takiar commented on HDFS-14348: - Think I found another issue, although this one seems minor. {{translateZCRException}} in {{hdfs.c}} does not always seem to free the exception; specifically if this code path is hit: {code} if (!strcmp(className, "java.lang.UnsupportedOperationException")) { ret = EPROTONOSUPPORT; goto done; } {code} The supplied {{jthrowable}} is never freed, because the {{done}} block only consists of: {code} done: free(className); return ret; {code} > Fix JNI exception handling issues in libhdfs > > > Key: HDFS-14348 > URL: https://issues.apache.org/jira/browse/HDFS-14348 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client, libhdfs, native >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > During some manual digging through the libhdfs code, we found several places > where we are not handling exceptions properly. > Specifically, there seem to be some violation of the following snippet from > the JNI Oracle docs > (https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/design.html#exceptions_and_error_codes): > {quote} > *Exceptions and Error Codes* > Certain JNI functions use the Java exception mechanism to report error > conditions. In most cases, JNI functions report error conditions by returning > an error code and throwing a Java exception. The error code is usually a > special return value (such as NULL) that is outside of the range of normal > return values. Therefore, the programmer can quickly check the return value > of the last JNI call to determine if an error has occurred, and call a > function, ExceptionOccurred(), to obtain the exception object that contains a > more detailed description of the error condition. > There are two cases where the programmer needs to check for exceptions > without being able to first check an error code: > [1] The JNI functions that invoke a Java method return the result of the Java > method. The programmer must call ExceptionOccurred() to check for possible > exceptions that occurred during the execution of the Java method. > [2] Some of the JNI array access functions do not return an error code, but > may throw an ArrayIndexOutOfBoundsException or ArrayStoreException. > In all other cases, a non-error return value guarantees that no exceptions > have been thrown. > {quote} > Here is a running list of issues: > * {{classNameOfObject}} in {{jni_helper.c}} calls {{CallObjectMethod}} but > does not check if an exception has occurred, it only checks if the result of > the method (in this case {{Class#getName(String)}}) returns {{NULL}} > * Exception handling in {{get_current_thread_id}} (both > {{posix/thread_local_storage.c}} and {{windows/thread_local_storage.c}}) > seems to have several issues; lots of JNI methods are called without checking > for exceptions > * Most of the calls to {{GetObjectArrayElement}} and {{GetByteArrayRegion}} > in {{hdfs.c}} do not check for exceptions properly > ** e.g. for {{GetObjectArrayElement}} they only check if the result of the > operation is {{NULL}}, but they should call {{ExceptionOccurred}} to look for > pending exceptions as well -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14348) Fix JNI exception handling issues in libhdfs
[ https://issues.apache.org/jira/browse/HDFS-14348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HDFS-14348: Status: Patch Available (was: Open) > Fix JNI exception handling issues in libhdfs > > > Key: HDFS-14348 > URL: https://issues.apache.org/jira/browse/HDFS-14348 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client, libhdfs, native >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > During some manual digging through the libhdfs code, we found several places > where we are not handling exceptions properly. > Specifically, there seem to be some violation of the following snippet from > the JNI Oracle docs > (https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/design.html#exceptions_and_error_codes): > {quote} > *Exceptions and Error Codes* > Certain JNI functions use the Java exception mechanism to report error > conditions. In most cases, JNI functions report error conditions by returning > an error code and throwing a Java exception. The error code is usually a > special return value (such as NULL) that is outside of the range of normal > return values. Therefore, the programmer can quickly check the return value > of the last JNI call to determine if an error has occurred, and call a > function, ExceptionOccurred(), to obtain the exception object that contains a > more detailed description of the error condition. > There are two cases where the programmer needs to check for exceptions > without being able to first check an error code: > [1] The JNI functions that invoke a Java method return the result of the Java > method. The programmer must call ExceptionOccurred() to check for possible > exceptions that occurred during the execution of the Java method. > [2] Some of the JNI array access functions do not return an error code, but > may throw an ArrayIndexOutOfBoundsException or ArrayStoreException. > In all other cases, a non-error return value guarantees that no exceptions > have been thrown. > {quote} > Here is a running list of issues: > * {{classNameOfObject}} in {{jni_helper.c}} calls {{CallObjectMethod}} but > does not check if an exception has occurred, it only checks if the result of > the method (in this case {{Class#getName(String)}}) returns {{NULL}} > * Exception handling in {{get_current_thread_id}} (both > {{posix/thread_local_storage.c}} and {{windows/thread_local_storage.c}}) > seems to have several issues; lots of JNI methods are called without checking > for exceptions > * Most of the calls to {{GetObjectArrayElement}} and {{GetByteArrayRegion}} > in {{hdfs.c}} do not check for exceptions properly > ** e.g. for {{GetObjectArrayElement}} they only check if the result of the > operation is {{NULL}}, but they should call {{ExceptionOccurred}} to look for > pending exceptions as well -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14348) Fix JNI exception handling issues in libhdfs
[ https://issues.apache.org/jira/browse/HDFS-14348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HDFS-14348: Status: Open (was: Patch Available) > Fix JNI exception handling issues in libhdfs > > > Key: HDFS-14348 > URL: https://issues.apache.org/jira/browse/HDFS-14348 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client, libhdfs, native >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > During some manual digging through the libhdfs code, we found several places > where we are not handling exceptions properly. > Specifically, there seem to be some violation of the following snippet from > the JNI Oracle docs > (https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/design.html#exceptions_and_error_codes): > {quote} > *Exceptions and Error Codes* > Certain JNI functions use the Java exception mechanism to report error > conditions. In most cases, JNI functions report error conditions by returning > an error code and throwing a Java exception. The error code is usually a > special return value (such as NULL) that is outside of the range of normal > return values. Therefore, the programmer can quickly check the return value > of the last JNI call to determine if an error has occurred, and call a > function, ExceptionOccurred(), to obtain the exception object that contains a > more detailed description of the error condition. > There are two cases where the programmer needs to check for exceptions > without being able to first check an error code: > [1] The JNI functions that invoke a Java method return the result of the Java > method. The programmer must call ExceptionOccurred() to check for possible > exceptions that occurred during the execution of the Java method. > [2] Some of the JNI array access functions do not return an error code, but > may throw an ArrayIndexOutOfBoundsException or ArrayStoreException. > In all other cases, a non-error return value guarantees that no exceptions > have been thrown. > {quote} > Here is a running list of issues: > * {{classNameOfObject}} in {{jni_helper.c}} calls {{CallObjectMethod}} but > does not check if an exception has occurred, it only checks if the result of > the method (in this case {{Class#getName(String)}}) returns {{NULL}} > * Exception handling in {{get_current_thread_id}} (both > {{posix/thread_local_storage.c}} and {{windows/thread_local_storage.c}}) > seems to have several issues; lots of JNI methods are called without checking > for exceptions > * Most of the calls to {{GetObjectArrayElement}} and {{GetByteArrayRegion}} > in {{hdfs.c}} do not check for exceptions properly > ** e.g. for {{GetObjectArrayElement}} they only check if the result of the > operation is {{NULL}}, but they should call {{ExceptionOccurred}} to look for > pending exceptions as well -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14348) Fix JNI exception handling issues in libhdfs
[ https://issues.apache.org/jira/browse/HDFS-14348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HDFS-14348: Status: Patch Available (was: Open) > Fix JNI exception handling issues in libhdfs > > > Key: HDFS-14348 > URL: https://issues.apache.org/jira/browse/HDFS-14348 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client, libhdfs, native >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > During some manual digging through the libhdfs code, we found several places > where we are not handling exceptions properly. > Specifically, there seem to be some violation of the following snippet from > the JNI Oracle docs > (https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/design.html#exceptions_and_error_codes): > {quote} > *Exceptions and Error Codes* > Certain JNI functions use the Java exception mechanism to report error > conditions. In most cases, JNI functions report error conditions by returning > an error code and throwing a Java exception. The error code is usually a > special return value (such as NULL) that is outside of the range of normal > return values. Therefore, the programmer can quickly check the return value > of the last JNI call to determine if an error has occurred, and call a > function, ExceptionOccurred(), to obtain the exception object that contains a > more detailed description of the error condition. > There are two cases where the programmer needs to check for exceptions > without being able to first check an error code: > [1] The JNI functions that invoke a Java method return the result of the Java > method. The programmer must call ExceptionOccurred() to check for possible > exceptions that occurred during the execution of the Java method. > [2] Some of the JNI array access functions do not return an error code, but > may throw an ArrayIndexOutOfBoundsException or ArrayStoreException. > In all other cases, a non-error return value guarantees that no exceptions > have been thrown. > {quote} > Here is a running list of issues: > * {{classNameOfObject}} in {{jni_helper.c}} calls {{CallObjectMethod}} but > does not check if an exception has occurred, it only checks if the result of > the method (in this case {{Class#getName(String)}}) returns {{NULL}} > * Exception handling in {{get_current_thread_id}} (both > {{posix/thread_local_storage.c}} and {{windows/thread_local_storage.c}}) > seems to have several issues; lots of JNI methods are called without checking > for exceptions > * Most of the calls to {{GetObjectArrayElement}} and {{GetByteArrayRegion}} > in {{hdfs.c}} do not check for exceptions properly > ** e.g. for {{GetObjectArrayElement}} they only check if the result of the > operation is {{NULL}}, but they should call {{ExceptionOccurred}} to look for > pending exceptions as well -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14348) Fix JNI exception handling issues in libhdfs
[ https://issues.apache.org/jira/browse/HDFS-14348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HDFS-14348: Status: Open (was: Patch Available) > Fix JNI exception handling issues in libhdfs > > > Key: HDFS-14348 > URL: https://issues.apache.org/jira/browse/HDFS-14348 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client, libhdfs, native >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > During some manual digging through the libhdfs code, we found several places > where we are not handling exceptions properly. > Specifically, there seem to be some violation of the following snippet from > the JNI Oracle docs > (https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/design.html#exceptions_and_error_codes): > {quote} > *Exceptions and Error Codes* > Certain JNI functions use the Java exception mechanism to report error > conditions. In most cases, JNI functions report error conditions by returning > an error code and throwing a Java exception. The error code is usually a > special return value (such as NULL) that is outside of the range of normal > return values. Therefore, the programmer can quickly check the return value > of the last JNI call to determine if an error has occurred, and call a > function, ExceptionOccurred(), to obtain the exception object that contains a > more detailed description of the error condition. > There are two cases where the programmer needs to check for exceptions > without being able to first check an error code: > [1] The JNI functions that invoke a Java method return the result of the Java > method. The programmer must call ExceptionOccurred() to check for possible > exceptions that occurred during the execution of the Java method. > [2] Some of the JNI array access functions do not return an error code, but > may throw an ArrayIndexOutOfBoundsException or ArrayStoreException. > In all other cases, a non-error return value guarantees that no exceptions > have been thrown. > {quote} > Here is a running list of issues: > * {{classNameOfObject}} in {{jni_helper.c}} calls {{CallObjectMethod}} but > does not check if an exception has occurred, it only checks if the result of > the method (in this case {{Class#getName(String)}}) returns {{NULL}} > * Exception handling in {{get_current_thread_id}} (both > {{posix/thread_local_storage.c}} and {{windows/thread_local_storage.c}}) > seems to have several issues; lots of JNI methods are called without checking > for exceptions > * Most of the calls to {{GetObjectArrayElement}} and {{GetByteArrayRegion}} > in {{hdfs.c}} do not check for exceptions properly > ** e.g. for {{GetObjectArrayElement}} they only check if the result of the > operation is {{NULL}}, but they should call {{ExceptionOccurred}} to look for > pending exceptions as well -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work stopped] (HDFS-14304) High lock contention on hdfsHashMutex in libhdfs
[ https://issues.apache.org/jira/browse/HDFS-14304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-14304 stopped by Sahil Takiar. --- > High lock contention on hdfsHashMutex in libhdfs > > > Key: HDFS-14304 > URL: https://issues.apache.org/jira/browse/HDFS-14304 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, libhdfs, native >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > While doing some performance profiling of an application using libhdfs, we > noticed a high amount of lock contention on the {{hdfsHashMutex}} defined in > {{hadoop-hdfs-native-client/src/main/native/libhdfs/os/mutexes.h}} > The issue is that every JNI method invocation done by {{hdfs.c}} goes through > a helper method called {{invokeMethod}}. {{invokeMethod}} calls > {{globalClassReference}} which acquires {{hdfsHashMutex}} while performing a > lookup in a {{htable}} (a custom hash table that lives in {{libhdfs/common}}) > (the lock is acquired for both reads and writes). The hash table maps {{char > *className}} to {{jclass}} objects, it seems the goal of the hash table is to > avoid repeatedly creating {{jclass}} objects for each JNI call. > For multi-threaded applications, this lock severely limits that rate at which > Java methods can be invoked. pstacks show a lot of time being spent on > {{hdfsHashMutex}} > {code:java} > #0 0x7fba2dbc242d in __lll_lock_wait () from /lib64/libpthread.so.0 > #1 0x7fba2dbbddcb in _L_lock_812 () from /lib64/libpthread.so.0 > #2 0x7fba2dbbdc98 in pthread_mutex_lock () from /lib64/libpthread.so.0 > #3 0x027d8386 in mutexLock () > #4 0x027d0e7b in globalClassReference () > #5 0x027d1160 in invokeMethod () > #6 0x027d4176 in readDirect () > #7 0x027d4325 in hdfsRead () > {code} > Same with {{perf report}} > {code:java} > + 63.36% 0.01% [k] system_call_fastpath > + 61.60% 0.12% [k] sys_futex > + 61.45% 0.13% [k] do_futex > + 57.54% 0.49% [k] _raw_qspin_lock > + 57.07% 0.01% [k] queued_spin_lock_slowpath > + 55.47%55.47% [k] native_queued_spin_lock_slowpath > - 35.68% 0.00% [k] 0x6f6f6461682f6568 >- 0x6f6f6461682f6568 > - 30.55% __lll_lock_wait > - 29.40% system_call_fastpath > - 29.39% sys_futex >- 29.35% do_futex > - 29.27% futex_wait > - 28.17% futex_wait_setup > - 27.05% _raw_qspin_lock >- 27.05% queued_spin_lock_slowpath > 26.30% native_queued_spin_lock_slowpath > + 0.67% ret_from_intr > + 0.71% futex_wait_queue_me > - 2.00% methodIdFromClass > - 1.94% jni_GetMethodID > - 1.71% get_method_id > 0.96% SymbolTable::lookup_only > - 1.61% invokeMethod > - 0.62% jni_CallLongMethodV > 0.52% jni_invoke_nonstatic > 0.75% pthread_mutex_lock > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14304) High lock contention on hdfsHashMutex in libhdfs
[ https://issues.apache.org/jira/browse/HDFS-14304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HDFS-14304: Status: Patch Available (was: Open) > High lock contention on hdfsHashMutex in libhdfs > > > Key: HDFS-14304 > URL: https://issues.apache.org/jira/browse/HDFS-14304 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, libhdfs, native >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > While doing some performance profiling of an application using libhdfs, we > noticed a high amount of lock contention on the {{hdfsHashMutex}} defined in > {{hadoop-hdfs-native-client/src/main/native/libhdfs/os/mutexes.h}} > The issue is that every JNI method invocation done by {{hdfs.c}} goes through > a helper method called {{invokeMethod}}. {{invokeMethod}} calls > {{globalClassReference}} which acquires {{hdfsHashMutex}} while performing a > lookup in a {{htable}} (a custom hash table that lives in {{libhdfs/common}}) > (the lock is acquired for both reads and writes). The hash table maps {{char > *className}} to {{jclass}} objects, it seems the goal of the hash table is to > avoid repeatedly creating {{jclass}} objects for each JNI call. > For multi-threaded applications, this lock severely limits that rate at which > Java methods can be invoked. pstacks show a lot of time being spent on > {{hdfsHashMutex}} > {code:java} > #0 0x7fba2dbc242d in __lll_lock_wait () from /lib64/libpthread.so.0 > #1 0x7fba2dbbddcb in _L_lock_812 () from /lib64/libpthread.so.0 > #2 0x7fba2dbbdc98 in pthread_mutex_lock () from /lib64/libpthread.so.0 > #3 0x027d8386 in mutexLock () > #4 0x027d0e7b in globalClassReference () > #5 0x027d1160 in invokeMethod () > #6 0x027d4176 in readDirect () > #7 0x027d4325 in hdfsRead () > {code} > Same with {{perf report}} > {code:java} > + 63.36% 0.01% [k] system_call_fastpath > + 61.60% 0.12% [k] sys_futex > + 61.45% 0.13% [k] do_futex > + 57.54% 0.49% [k] _raw_qspin_lock > + 57.07% 0.01% [k] queued_spin_lock_slowpath > + 55.47%55.47% [k] native_queued_spin_lock_slowpath > - 35.68% 0.00% [k] 0x6f6f6461682f6568 >- 0x6f6f6461682f6568 > - 30.55% __lll_lock_wait > - 29.40% system_call_fastpath > - 29.39% sys_futex >- 29.35% do_futex > - 29.27% futex_wait > - 28.17% futex_wait_setup > - 27.05% _raw_qspin_lock >- 27.05% queued_spin_lock_slowpath > 26.30% native_queued_spin_lock_slowpath > + 0.67% ret_from_intr > + 0.71% futex_wait_queue_me > - 2.00% methodIdFromClass > - 1.94% jni_GetMethodID > - 1.71% get_method_id > 0.96% SymbolTable::lookup_only > - 1.61% invokeMethod > - 0.62% jni_CallLongMethodV > 0.52% jni_invoke_nonstatic > 0.75% pthread_mutex_lock > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14348) Fix JNI exception handling issues in libhdfs
[ https://issues.apache.org/jira/browse/HDFS-14348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HDFS-14348: Status: Open (was: Patch Available) > Fix JNI exception handling issues in libhdfs > > > Key: HDFS-14348 > URL: https://issues.apache.org/jira/browse/HDFS-14348 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client, libhdfs, native >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > During some manual digging through the libhdfs code, we found several places > where we are not handling exceptions properly. > Specifically, there seem to be some violation of the following snippet from > the JNI Oracle docs > (https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/design.html#exceptions_and_error_codes): > {quote} > *Exceptions and Error Codes* > Certain JNI functions use the Java exception mechanism to report error > conditions. In most cases, JNI functions report error conditions by returning > an error code and throwing a Java exception. The error code is usually a > special return value (such as NULL) that is outside of the range of normal > return values. Therefore, the programmer can quickly check the return value > of the last JNI call to determine if an error has occurred, and call a > function, ExceptionOccurred(), to obtain the exception object that contains a > more detailed description of the error condition. > There are two cases where the programmer needs to check for exceptions > without being able to first check an error code: > [1] The JNI functions that invoke a Java method return the result of the Java > method. The programmer must call ExceptionOccurred() to check for possible > exceptions that occurred during the execution of the Java method. > [2] Some of the JNI array access functions do not return an error code, but > may throw an ArrayIndexOutOfBoundsException or ArrayStoreException. > In all other cases, a non-error return value guarantees that no exceptions > have been thrown. > {quote} > Here is a running list of issues: > * {{classNameOfObject}} in {{jni_helper.c}} calls {{CallObjectMethod}} but > does not check if an exception has occurred, it only checks if the result of > the method (in this case {{Class#getName(String)}}) returns {{NULL}} > * Exception handling in {{get_current_thread_id}} (both > {{posix/thread_local_storage.c}} and {{windows/thread_local_storage.c}}) > seems to have several issues; lots of JNI methods are called without checking > for exceptions > * Most of the calls to {{GetObjectArrayElement}} and {{GetByteArrayRegion}} > in {{hdfs.c}} do not check for exceptions properly > ** e.g. for {{GetObjectArrayElement}} they only check if the result of the > operation is {{NULL}}, but they should call {{ExceptionOccurred}} to look for > pending exceptions as well -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14348) Fix JNI exception handling issues in libhdfs
[ https://issues.apache.org/jira/browse/HDFS-14348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HDFS-14348: Status: Patch Available (was: Open) > Fix JNI exception handling issues in libhdfs > > > Key: HDFS-14348 > URL: https://issues.apache.org/jira/browse/HDFS-14348 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client, libhdfs, native >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > During some manual digging through the libhdfs code, we found several places > where we are not handling exceptions properly. > Specifically, there seem to be some violation of the following snippet from > the JNI Oracle docs > (https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/design.html#exceptions_and_error_codes): > {quote} > *Exceptions and Error Codes* > Certain JNI functions use the Java exception mechanism to report error > conditions. In most cases, JNI functions report error conditions by returning > an error code and throwing a Java exception. The error code is usually a > special return value (such as NULL) that is outside of the range of normal > return values. Therefore, the programmer can quickly check the return value > of the last JNI call to determine if an error has occurred, and call a > function, ExceptionOccurred(), to obtain the exception object that contains a > more detailed description of the error condition. > There are two cases where the programmer needs to check for exceptions > without being able to first check an error code: > [1] The JNI functions that invoke a Java method return the result of the Java > method. The programmer must call ExceptionOccurred() to check for possible > exceptions that occurred during the execution of the Java method. > [2] Some of the JNI array access functions do not return an error code, but > may throw an ArrayIndexOutOfBoundsException or ArrayStoreException. > In all other cases, a non-error return value guarantees that no exceptions > have been thrown. > {quote} > Here is a running list of issues: > * {{classNameOfObject}} in {{jni_helper.c}} calls {{CallObjectMethod}} but > does not check if an exception has occurred, it only checks if the result of > the method (in this case {{Class#getName(String)}}) returns {{NULL}} > * Exception handling in {{get_current_thread_id}} (both > {{posix/thread_local_storage.c}} and {{windows/thread_local_storage.c}}) > seems to have several issues; lots of JNI methods are called without checking > for exceptions > * Most of the calls to {{GetObjectArrayElement}} and {{GetByteArrayRegion}} > in {{hdfs.c}} do not check for exceptions properly > ** e.g. for {{GetObjectArrayElement}} they only check if the result of the > operation is {{NULL}}, but they should call {{ExceptionOccurred}} to look for > pending exceptions as well -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work started] (HDFS-14304) High lock contention on hdfsHashMutex in libhdfs
[ https://issues.apache.org/jira/browse/HDFS-14304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-14304 started by Sahil Takiar. --- > High lock contention on hdfsHashMutex in libhdfs > > > Key: HDFS-14304 > URL: https://issues.apache.org/jira/browse/HDFS-14304 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, libhdfs, native >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > While doing some performance profiling of an application using libhdfs, we > noticed a high amount of lock contention on the {{hdfsHashMutex}} defined in > {{hadoop-hdfs-native-client/src/main/native/libhdfs/os/mutexes.h}} > The issue is that every JNI method invocation done by {{hdfs.c}} goes through > a helper method called {{invokeMethod}}. {{invokeMethod}} calls > {{globalClassReference}} which acquires {{hdfsHashMutex}} while performing a > lookup in a {{htable}} (a custom hash table that lives in {{libhdfs/common}}) > (the lock is acquired for both reads and writes). The hash table maps {{char > *className}} to {{jclass}} objects, it seems the goal of the hash table is to > avoid repeatedly creating {{jclass}} objects for each JNI call. > For multi-threaded applications, this lock severely limits that rate at which > Java methods can be invoked. pstacks show a lot of time being spent on > {{hdfsHashMutex}} > {code:java} > #0 0x7fba2dbc242d in __lll_lock_wait () from /lib64/libpthread.so.0 > #1 0x7fba2dbbddcb in _L_lock_812 () from /lib64/libpthread.so.0 > #2 0x7fba2dbbdc98 in pthread_mutex_lock () from /lib64/libpthread.so.0 > #3 0x027d8386 in mutexLock () > #4 0x027d0e7b in globalClassReference () > #5 0x027d1160 in invokeMethod () > #6 0x027d4176 in readDirect () > #7 0x027d4325 in hdfsRead () > {code} > Same with {{perf report}} > {code:java} > + 63.36% 0.01% [k] system_call_fastpath > + 61.60% 0.12% [k] sys_futex > + 61.45% 0.13% [k] do_futex > + 57.54% 0.49% [k] _raw_qspin_lock > + 57.07% 0.01% [k] queued_spin_lock_slowpath > + 55.47%55.47% [k] native_queued_spin_lock_slowpath > - 35.68% 0.00% [k] 0x6f6f6461682f6568 >- 0x6f6f6461682f6568 > - 30.55% __lll_lock_wait > - 29.40% system_call_fastpath > - 29.39% sys_futex >- 29.35% do_futex > - 29.27% futex_wait > - 28.17% futex_wait_setup > - 27.05% _raw_qspin_lock >- 27.05% queued_spin_lock_slowpath > 26.30% native_queued_spin_lock_slowpath > + 0.67% ret_from_intr > + 0.71% futex_wait_queue_me > - 2.00% methodIdFromClass > - 1.94% jni_GetMethodID > - 1.71% get_method_id > 0.96% SymbolTable::lookup_only > - 1.61% invokeMethod > - 0.62% jni_CallLongMethodV > 0.52% jni_invoke_nonstatic > 0.75% pthread_mutex_lock > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14304) High lock contention on hdfsHashMutex in libhdfs
[ https://issues.apache.org/jira/browse/HDFS-14304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HDFS-14304: Status: Open (was: Patch Available) > High lock contention on hdfsHashMutex in libhdfs > > > Key: HDFS-14304 > URL: https://issues.apache.org/jira/browse/HDFS-14304 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, libhdfs, native >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > While doing some performance profiling of an application using libhdfs, we > noticed a high amount of lock contention on the {{hdfsHashMutex}} defined in > {{hadoop-hdfs-native-client/src/main/native/libhdfs/os/mutexes.h}} > The issue is that every JNI method invocation done by {{hdfs.c}} goes through > a helper method called {{invokeMethod}}. {{invokeMethod}} calls > {{globalClassReference}} which acquires {{hdfsHashMutex}} while performing a > lookup in a {{htable}} (a custom hash table that lives in {{libhdfs/common}}) > (the lock is acquired for both reads and writes). The hash table maps {{char > *className}} to {{jclass}} objects, it seems the goal of the hash table is to > avoid repeatedly creating {{jclass}} objects for each JNI call. > For multi-threaded applications, this lock severely limits that rate at which > Java methods can be invoked. pstacks show a lot of time being spent on > {{hdfsHashMutex}} > {code:java} > #0 0x7fba2dbc242d in __lll_lock_wait () from /lib64/libpthread.so.0 > #1 0x7fba2dbbddcb in _L_lock_812 () from /lib64/libpthread.so.0 > #2 0x7fba2dbbdc98 in pthread_mutex_lock () from /lib64/libpthread.so.0 > #3 0x027d8386 in mutexLock () > #4 0x027d0e7b in globalClassReference () > #5 0x027d1160 in invokeMethod () > #6 0x027d4176 in readDirect () > #7 0x027d4325 in hdfsRead () > {code} > Same with {{perf report}} > {code:java} > + 63.36% 0.01% [k] system_call_fastpath > + 61.60% 0.12% [k] sys_futex > + 61.45% 0.13% [k] do_futex > + 57.54% 0.49% [k] _raw_qspin_lock > + 57.07% 0.01% [k] queued_spin_lock_slowpath > + 55.47%55.47% [k] native_queued_spin_lock_slowpath > - 35.68% 0.00% [k] 0x6f6f6461682f6568 >- 0x6f6f6461682f6568 > - 30.55% __lll_lock_wait > - 29.40% system_call_fastpath > - 29.39% sys_futex >- 29.35% do_futex > - 29.27% futex_wait > - 28.17% futex_wait_setup > - 27.05% _raw_qspin_lock >- 27.05% queued_spin_lock_slowpath > 26.30% native_queued_spin_lock_slowpath > + 0.67% ret_from_intr > + 0.71% futex_wait_queue_me > - 2.00% methodIdFromClass > - 1.94% jni_GetMethodID > - 1.71% get_method_id > 0.96% SymbolTable::lookup_only > - 1.61% invokeMethod > - 0.62% jni_CallLongMethodV > 0.52% jni_invoke_nonstatic > 0.75% pthread_mutex_lock > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-3246) pRead equivalent for direct read path
[ https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16795353#comment-16795353 ] Sahil Takiar commented on HDFS-3246: [~openinx], [~jojochuang] any additional comments? > pRead equivalent for direct read path > - > > Key: HDFS-3246 > URL: https://issues.apache.org/jira/browse/HDFS-3246 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, performance >Affects Versions: 3.0.0-alpha1 >Reporter: Henry Robinson >Assignee: Sahil Takiar >Priority: Major > Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, > HDFS-3246.003.patch, HDFS-3246.004.patch, HDFS-3246.005.patch, > HDFS-3246.006.patch, HDFS-3246.007.patch > > > There is no pread equivalent in ByteBufferReadable. We should consider adding > one. It would be relatively easy to implement for the distributed case > (certainly compared to HDFS-2834), since DFSInputStream does most of the > heavy lifting. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14304) High lock contention on hdfsHashMutex in libhdfs
[ https://issues.apache.org/jira/browse/HDFS-14304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HDFS-14304: Status: Patch Available (was: Open) > High lock contention on hdfsHashMutex in libhdfs > > > Key: HDFS-14304 > URL: https://issues.apache.org/jira/browse/HDFS-14304 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, libhdfs, native >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > While doing some performance profiling of an application using libhdfs, we > noticed a high amount of lock contention on the {{hdfsHashMutex}} defined in > {{hadoop-hdfs-native-client/src/main/native/libhdfs/os/mutexes.h}} > The issue is that every JNI method invocation done by {{hdfs.c}} goes through > a helper method called {{invokeMethod}}. {{invokeMethod}} calls > {{globalClassReference}} which acquires {{hdfsHashMutex}} while performing a > lookup in a {{htable}} (a custom hash table that lives in {{libhdfs/common}}) > (the lock is acquired for both reads and writes). The hash table maps {{char > *className}} to {{jclass}} objects, it seems the goal of the hash table is to > avoid repeatedly creating {{jclass}} objects for each JNI call. > For multi-threaded applications, this lock severely limits that rate at which > Java methods can be invoked. pstacks show a lot of time being spent on > {{hdfsHashMutex}} > {code:java} > #0 0x7fba2dbc242d in __lll_lock_wait () from /lib64/libpthread.so.0 > #1 0x7fba2dbbddcb in _L_lock_812 () from /lib64/libpthread.so.0 > #2 0x7fba2dbbdc98 in pthread_mutex_lock () from /lib64/libpthread.so.0 > #3 0x027d8386 in mutexLock () > #4 0x027d0e7b in globalClassReference () > #5 0x027d1160 in invokeMethod () > #6 0x027d4176 in readDirect () > #7 0x027d4325 in hdfsRead () > {code} > Same with {{perf report}} > {code:java} > + 63.36% 0.01% [k] system_call_fastpath > + 61.60% 0.12% [k] sys_futex > + 61.45% 0.13% [k] do_futex > + 57.54% 0.49% [k] _raw_qspin_lock > + 57.07% 0.01% [k] queued_spin_lock_slowpath > + 55.47%55.47% [k] native_queued_spin_lock_slowpath > - 35.68% 0.00% [k] 0x6f6f6461682f6568 >- 0x6f6f6461682f6568 > - 30.55% __lll_lock_wait > - 29.40% system_call_fastpath > - 29.39% sys_futex >- 29.35% do_futex > - 29.27% futex_wait > - 28.17% futex_wait_setup > - 27.05% _raw_qspin_lock >- 27.05% queued_spin_lock_slowpath > 26.30% native_queued_spin_lock_slowpath > + 0.67% ret_from_intr > + 0.71% futex_wait_queue_me > - 2.00% methodIdFromClass > - 1.94% jni_GetMethodID > - 1.71% get_method_id > 0.96% SymbolTable::lookup_only > - 1.61% invokeMethod > - 0.62% jni_CallLongMethodV > 0.52% jni_invoke_nonstatic > 0.75% pthread_mutex_lock > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14304) High lock contention on hdfsHashMutex in libhdfs
[ https://issues.apache.org/jira/browse/HDFS-14304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HDFS-14304: Status: Open (was: Patch Available) > High lock contention on hdfsHashMutex in libhdfs > > > Key: HDFS-14304 > URL: https://issues.apache.org/jira/browse/HDFS-14304 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, libhdfs, native >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > While doing some performance profiling of an application using libhdfs, we > noticed a high amount of lock contention on the {{hdfsHashMutex}} defined in > {{hadoop-hdfs-native-client/src/main/native/libhdfs/os/mutexes.h}} > The issue is that every JNI method invocation done by {{hdfs.c}} goes through > a helper method called {{invokeMethod}}. {{invokeMethod}} calls > {{globalClassReference}} which acquires {{hdfsHashMutex}} while performing a > lookup in a {{htable}} (a custom hash table that lives in {{libhdfs/common}}) > (the lock is acquired for both reads and writes). The hash table maps {{char > *className}} to {{jclass}} objects, it seems the goal of the hash table is to > avoid repeatedly creating {{jclass}} objects for each JNI call. > For multi-threaded applications, this lock severely limits that rate at which > Java methods can be invoked. pstacks show a lot of time being spent on > {{hdfsHashMutex}} > {code:java} > #0 0x7fba2dbc242d in __lll_lock_wait () from /lib64/libpthread.so.0 > #1 0x7fba2dbbddcb in _L_lock_812 () from /lib64/libpthread.so.0 > #2 0x7fba2dbbdc98 in pthread_mutex_lock () from /lib64/libpthread.so.0 > #3 0x027d8386 in mutexLock () > #4 0x027d0e7b in globalClassReference () > #5 0x027d1160 in invokeMethod () > #6 0x027d4176 in readDirect () > #7 0x027d4325 in hdfsRead () > {code} > Same with {{perf report}} > {code:java} > + 63.36% 0.01% [k] system_call_fastpath > + 61.60% 0.12% [k] sys_futex > + 61.45% 0.13% [k] do_futex > + 57.54% 0.49% [k] _raw_qspin_lock > + 57.07% 0.01% [k] queued_spin_lock_slowpath > + 55.47%55.47% [k] native_queued_spin_lock_slowpath > - 35.68% 0.00% [k] 0x6f6f6461682f6568 >- 0x6f6f6461682f6568 > - 30.55% __lll_lock_wait > - 29.40% system_call_fastpath > - 29.39% sys_futex >- 29.35% do_futex > - 29.27% futex_wait > - 28.17% futex_wait_setup > - 27.05% _raw_qspin_lock >- 27.05% queued_spin_lock_slowpath > 26.30% native_queued_spin_lock_slowpath > + 0.67% ret_from_intr > + 0.71% futex_wait_queue_me > - 2.00% methodIdFromClass > - 1.94% jni_GetMethodID > - 1.71% get_method_id > 0.96% SymbolTable::lookup_only > - 1.61% invokeMethod > - 0.62% jni_CallLongMethodV > 0.52% jni_invoke_nonstatic > 0.75% pthread_mutex_lock > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14348) Fix JNI exception handling issues in libhdfs
[ https://issues.apache.org/jira/browse/HDFS-14348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HDFS-14348: Status: Open (was: Patch Available) > Fix JNI exception handling issues in libhdfs > > > Key: HDFS-14348 > URL: https://issues.apache.org/jira/browse/HDFS-14348 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client, libhdfs, native >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > During some manual digging through the libhdfs code, we found several places > where we are not handling exceptions properly. > Specifically, there seem to be some violation of the following snippet from > the JNI Oracle docs > (https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/design.html#exceptions_and_error_codes): > {quote} > *Exceptions and Error Codes* > Certain JNI functions use the Java exception mechanism to report error > conditions. In most cases, JNI functions report error conditions by returning > an error code and throwing a Java exception. The error code is usually a > special return value (such as NULL) that is outside of the range of normal > return values. Therefore, the programmer can quickly check the return value > of the last JNI call to determine if an error has occurred, and call a > function, ExceptionOccurred(), to obtain the exception object that contains a > more detailed description of the error condition. > There are two cases where the programmer needs to check for exceptions > without being able to first check an error code: > [1] The JNI functions that invoke a Java method return the result of the Java > method. The programmer must call ExceptionOccurred() to check for possible > exceptions that occurred during the execution of the Java method. > [2] Some of the JNI array access functions do not return an error code, but > may throw an ArrayIndexOutOfBoundsException or ArrayStoreException. > In all other cases, a non-error return value guarantees that no exceptions > have been thrown. > {quote} > Here is a running list of issues: > * {{classNameOfObject}} in {{jni_helper.c}} calls {{CallObjectMethod}} but > does not check if an exception has occurred, it only checks if the result of > the method (in this case {{Class#getName(String)}}) returns {{NULL}} > * Exception handling in {{get_current_thread_id}} (both > {{posix/thread_local_storage.c}} and {{windows/thread_local_storage.c}}) > seems to have several issues; lots of JNI methods are called without checking > for exceptions > * Most of the calls to {{GetObjectArrayElement}} and {{GetByteArrayRegion}} > in {{hdfs.c}} do not check for exceptions properly > ** e.g. for {{GetObjectArrayElement}} they only check if the result of the > operation is {{NULL}}, but they should call {{ExceptionOccurred}} to look for > pending exceptions as well -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14348) Fix JNI exception handling issues in libhdfs
[ https://issues.apache.org/jira/browse/HDFS-14348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HDFS-14348: Status: Patch Available (was: Open) > Fix JNI exception handling issues in libhdfs > > > Key: HDFS-14348 > URL: https://issues.apache.org/jira/browse/HDFS-14348 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client, libhdfs, native >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > During some manual digging through the libhdfs code, we found several places > where we are not handling exceptions properly. > Specifically, there seem to be some violation of the following snippet from > the JNI Oracle docs > (https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/design.html#exceptions_and_error_codes): > {quote} > *Exceptions and Error Codes* > Certain JNI functions use the Java exception mechanism to report error > conditions. In most cases, JNI functions report error conditions by returning > an error code and throwing a Java exception. The error code is usually a > special return value (such as NULL) that is outside of the range of normal > return values. Therefore, the programmer can quickly check the return value > of the last JNI call to determine if an error has occurred, and call a > function, ExceptionOccurred(), to obtain the exception object that contains a > more detailed description of the error condition. > There are two cases where the programmer needs to check for exceptions > without being able to first check an error code: > [1] The JNI functions that invoke a Java method return the result of the Java > method. The programmer must call ExceptionOccurred() to check for possible > exceptions that occurred during the execution of the Java method. > [2] Some of the JNI array access functions do not return an error code, but > may throw an ArrayIndexOutOfBoundsException or ArrayStoreException. > In all other cases, a non-error return value guarantees that no exceptions > have been thrown. > {quote} > Here is a running list of issues: > * {{classNameOfObject}} in {{jni_helper.c}} calls {{CallObjectMethod}} but > does not check if an exception has occurred, it only checks if the result of > the method (in this case {{Class#getName(String)}}) returns {{NULL}} > * Exception handling in {{get_current_thread_id}} (both > {{posix/thread_local_storage.c}} and {{windows/thread_local_storage.c}}) > seems to have several issues; lots of JNI methods are called without checking > for exceptions > * Most of the calls to {{GetObjectArrayElement}} and {{GetByteArrayRegion}} > in {{hdfs.c}} do not check for exceptions properly > ** e.g. for {{GetObjectArrayElement}} they only check if the result of the > operation is {{NULL}}, but they should call {{ExceptionOccurred}} to look for > pending exceptions as well -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-3246) pRead equivalent for direct read path
[ https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HDFS-3246: --- Status: Patch Available (was: Open) > pRead equivalent for direct read path > - > > Key: HDFS-3246 > URL: https://issues.apache.org/jira/browse/HDFS-3246 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, performance >Affects Versions: 3.0.0-alpha1 >Reporter: Henry Robinson >Assignee: Sahil Takiar >Priority: Major > Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, > HDFS-3246.003.patch, HDFS-3246.004.patch, HDFS-3246.005.patch, > HDFS-3246.006.patch, HDFS-3246.007.patch > > > There is no pread equivalent in ByteBufferReadable. We should consider adding > one. It would be relatively easy to implement for the distributed case > (certainly compared to HDFS-2834), since DFSInputStream does most of the > heavy lifting. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-3246) pRead equivalent for direct read path
[ https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HDFS-3246: --- Status: Open (was: Patch Available) > pRead equivalent for direct read path > - > > Key: HDFS-3246 > URL: https://issues.apache.org/jira/browse/HDFS-3246 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, performance >Affects Versions: 3.0.0-alpha1 >Reporter: Henry Robinson >Assignee: Sahil Takiar >Priority: Major > Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, > HDFS-3246.003.patch, HDFS-3246.004.patch, HDFS-3246.005.patch, > HDFS-3246.006.patch, HDFS-3246.007.patch > > > There is no pread equivalent in ByteBufferReadable. We should consider adding > one. It would be relatively easy to implement for the distributed case > (certainly compared to HDFS-2834), since DFSInputStream does most of the > heavy lifting. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14348) Fix JNI exception handling issues in libhdfs
[ https://issues.apache.org/jira/browse/HDFS-14348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HDFS-14348: Status: Patch Available (was: Open) > Fix JNI exception handling issues in libhdfs > > > Key: HDFS-14348 > URL: https://issues.apache.org/jira/browse/HDFS-14348 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client, libhdfs, native >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > During some manual digging through the libhdfs code, we found several places > where we are not handling exceptions properly. > Specifically, there seem to be some violation of the following snippet from > the JNI Oracle docs > (https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/design.html#exceptions_and_error_codes): > {quote} > *Exceptions and Error Codes* > Certain JNI functions use the Java exception mechanism to report error > conditions. In most cases, JNI functions report error conditions by returning > an error code and throwing a Java exception. The error code is usually a > special return value (such as NULL) that is outside of the range of normal > return values. Therefore, the programmer can quickly check the return value > of the last JNI call to determine if an error has occurred, and call a > function, ExceptionOccurred(), to obtain the exception object that contains a > more detailed description of the error condition. > There are two cases where the programmer needs to check for exceptions > without being able to first check an error code: > [1] The JNI functions that invoke a Java method return the result of the Java > method. The programmer must call ExceptionOccurred() to check for possible > exceptions that occurred during the execution of the Java method. > [2] Some of the JNI array access functions do not return an error code, but > may throw an ArrayIndexOutOfBoundsException or ArrayStoreException. > In all other cases, a non-error return value guarantees that no exceptions > have been thrown. > {quote} > Here is a running list of issues: > * {{classNameOfObject}} in {{jni_helper.c}} calls {{CallObjectMethod}} but > does not check if an exception has occurred, it only checks if the result of > the method (in this case {{Class#getName(String)}}) returns {{NULL}} > * Exception handling in {{get_current_thread_id}} (both > {{posix/thread_local_storage.c}} and {{windows/thread_local_storage.c}}) > seems to have several issues; lots of JNI methods are called without checking > for exceptions > * Most of the calls to {{GetObjectArrayElement}} and {{GetByteArrayRegion}} > in {{hdfs.c}} do not check for exceptions properly > ** e.g. for {{GetObjectArrayElement}} they only check if the result of the > operation is {{NULL}}, but they should call {{ExceptionOccurred}} to look for > pending exceptions as well -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14348) Fix JNI exception handling issues in libhdfs
[ https://issues.apache.org/jira/browse/HDFS-14348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791757#comment-16791757 ] Sahil Takiar commented on HDFS-14348: - PR: https://github.com/apache/hadoop/pull/600 > Fix JNI exception handling issues in libhdfs > > > Key: HDFS-14348 > URL: https://issues.apache.org/jira/browse/HDFS-14348 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client, libhdfs, native >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > During some manual digging through the libhdfs code, we found several places > where we are not handling exceptions properly. > Specifically, there seem to be some violation of the following snippet from > the JNI Oracle docs > (https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/design.html#exceptions_and_error_codes): > {quote} > *Exceptions and Error Codes* > Certain JNI functions use the Java exception mechanism to report error > conditions. In most cases, JNI functions report error conditions by returning > an error code and throwing a Java exception. The error code is usually a > special return value (such as NULL) that is outside of the range of normal > return values. Therefore, the programmer can quickly check the return value > of the last JNI call to determine if an error has occurred, and call a > function, ExceptionOccurred(), to obtain the exception object that contains a > more detailed description of the error condition. > There are two cases where the programmer needs to check for exceptions > without being able to first check an error code: > [1] The JNI functions that invoke a Java method return the result of the Java > method. The programmer must call ExceptionOccurred() to check for possible > exceptions that occurred during the execution of the Java method. > [2] Some of the JNI array access functions do not return an error code, but > may throw an ArrayIndexOutOfBoundsException or ArrayStoreException. > In all other cases, a non-error return value guarantees that no exceptions > have been thrown. > {quote} > Here is a running list of issues: > * {{classNameOfObject}} in {{jni_helper.c}} calls {{CallObjectMethod}} but > does not check if an exception has occurred, it only checks if the result of > the method (in this case {{Class#getName(String)}}) returns {{NULL}} > * Exception handling in {{get_current_thread_id}} (both > {{posix/thread_local_storage.c}} and {{windows/thread_local_storage.c}}) > seems to have several issues; lots of JNI methods are called without checking > for exceptions > * Most of the calls to {{GetObjectArrayElement}} and {{GetByteArrayRegion}} > in {{hdfs.c}} do not check for exceptions properly > ** e.g. for {{GetObjectArrayElement}} they only check if the result of the > operation is {{NULL}}, but they should call {{ExceptionOccurred}} to look for > pending exceptions as well -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14348) Fix JNI exception handling issues in libhdfs
[ https://issues.apache.org/jira/browse/HDFS-14348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790898#comment-16790898 ] Sahil Takiar commented on HDFS-14348: - Found some more issues: {code} static jthrowable hadoopRzOptionsGetEnumSet(JNIEnv *env, struct hadoopRzOptions *opts, jobject *enumSet) { ... jclass clazz = (*env)->FindClass(env, READ_OPTION); if (!clazz) { jthr = newRuntimeError(env, "failed " "to find class for %s", READ_OPTION); goto done; } ... {code} {code} jthrowable newRuntimeError(JNIEnv *env, const char *fmt, ...) { char buf[512]; jobject out, exc; jstring jstr; va_list ap; va_start(ap, fmt); vsnprintf(buf, sizeof(buf), fmt, ap); va_end(ap); jstr = (*env)->NewStringUTF(env, buf); if (!jstr) { // We got an out of memory exception rather than a RuntimeException. // Too bad... return getPendingExceptionAndClear(env); } ... {code} The issue is that {{FindClass}} can throw an error, but the call to {{newRuntimeError}} calls {{NewStringUTF}} without clearing the pending exception possibly thrown by {{FindClass}}. According to https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/design.html#exception_handling this is illegal; you cannot call {{NewStringUTF}} while there is an exception pending. I think I missed this in HDFS-14321: {{hadoopRzOptionsSetByteBufferPool}} calls {{opts->byteBufferPool = (*env)->NewGlobalRef(env, byteBufferPool)}} but does not check for exceptions afterwards. > Fix JNI exception handling issues in libhdfs > > > Key: HDFS-14348 > URL: https://issues.apache.org/jira/browse/HDFS-14348 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client, libhdfs, native >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > During some manual digging through the libhdfs code, we found several places > where we are not handling exceptions properly. > Specifically, there seem to be some violation of the following snippet from > the JNI Oracle docs > (https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/design.html#exceptions_and_error_codes): > {quote} > *Exceptions and Error Codes* > Certain JNI functions use the Java exception mechanism to report error > conditions. In most cases, JNI functions report error conditions by returning > an error code and throwing a Java exception. The error code is usually a > special return value (such as NULL) that is outside of the range of normal > return values. Therefore, the programmer can quickly check the return value > of the last JNI call to determine if an error has occurred, and call a > function, ExceptionOccurred(), to obtain the exception object that contains a > more detailed description of the error condition. > There are two cases where the programmer needs to check for exceptions > without being able to first check an error code: > [1] The JNI functions that invoke a Java method return the result of the Java > method. The programmer must call ExceptionOccurred() to check for possible > exceptions that occurred during the execution of the Java method. > [2] Some of the JNI array access functions do not return an error code, but > may throw an ArrayIndexOutOfBoundsException or ArrayStoreException. > In all other cases, a non-error return value guarantees that no exceptions > have been thrown. > {quote} > Here is a running list of issues: > * {{classNameOfObject}} in {{jni_helper.c}} calls {{CallObjectMethod}} but > does not check if an exception has occurred, it only checks if the result of > the method (in this case {{Class#getName(String)}}) returns {{NULL}} > * Exception handling in {{get_current_thread_id}} (both > {{posix/thread_local_storage.c}} and {{windows/thread_local_storage.c}}) > seems to have several issues; lots of JNI methods are called without checking > for exceptions > * Most of the calls to {{GetObjectArrayElement}} and {{GetByteArrayRegion}} > in {{hdfs.c}} do not check for exceptions properly > ** e.g. for {{GetObjectArrayElement}} they only check if the result of the > operation is {{NULL}}, but they should call {{ExceptionOccurred}} to look for > pending exceptions as well -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-3246) pRead equivalent for direct read path
[ https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790786#comment-16790786 ] Sahil Takiar commented on HDFS-3246: [~openinx] created a PR: https://github.com/apache/hadoop/pull/597 The latest version fixes a few issues reported by Hadoop QA, and adds some more Javadocs to {{CryptoInputStream}} to make the changes easier to understand. Let me see if I can find someone more comfortable with libhdfs to review the libhdfs code. > pRead equivalent for direct read path > - > > Key: HDFS-3246 > URL: https://issues.apache.org/jira/browse/HDFS-3246 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, performance >Affects Versions: 3.0.0-alpha1 >Reporter: Henry Robinson >Assignee: Sahil Takiar >Priority: Major > Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, > HDFS-3246.003.patch, HDFS-3246.004.patch, HDFS-3246.005.patch, > HDFS-3246.006.patch, HDFS-3246.007.patch > > > There is no pread equivalent in ByteBufferReadable. We should consider adding > one. It would be relatively easy to implement for the distributed case > (certainly compared to HDFS-2834), since DFSInputStream does most of the > heavy lifting. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-3246) pRead equivalent for direct read path
[ https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HDFS-3246: --- Status: Patch Available (was: Open) > pRead equivalent for direct read path > - > > Key: HDFS-3246 > URL: https://issues.apache.org/jira/browse/HDFS-3246 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, performance >Affects Versions: 3.0.0-alpha1 >Reporter: Henry Robinson >Assignee: Sahil Takiar >Priority: Major > Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, > HDFS-3246.003.patch, HDFS-3246.004.patch, HDFS-3246.005.patch, > HDFS-3246.006.patch, HDFS-3246.007.patch > > > There is no pread equivalent in ByteBufferReadable. We should consider adding > one. It would be relatively easy to implement for the distributed case > (certainly compared to HDFS-2834), since DFSInputStream does most of the > heavy lifting. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-3246) pRead equivalent for direct read path
[ https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HDFS-3246: --- Status: Open (was: Patch Available) > pRead equivalent for direct read path > - > > Key: HDFS-3246 > URL: https://issues.apache.org/jira/browse/HDFS-3246 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, performance >Affects Versions: 3.0.0-alpha1 >Reporter: Henry Robinson >Assignee: Sahil Takiar >Priority: Major > Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, > HDFS-3246.003.patch, HDFS-3246.004.patch, HDFS-3246.005.patch, > HDFS-3246.006.patch, HDFS-3246.007.patch > > > There is no pread equivalent in ByteBufferReadable. We should consider adding > one. It would be relatively easy to implement for the distributed case > (certainly compared to HDFS-2834), since DFSInputStream does most of the > heavy lifting. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14304) High lock contention on hdfsHashMutex in libhdfs
[ https://issues.apache.org/jira/browse/HDFS-14304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790661#comment-16790661 ] Sahil Takiar commented on HDFS-14304: - Opened a PR since the patch to fix this is quite large: https://github.com/apache/hadoop/pull/595 - the PR description provides some more details about the approach taken. > High lock contention on hdfsHashMutex in libhdfs > > > Key: HDFS-14304 > URL: https://issues.apache.org/jira/browse/HDFS-14304 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, libhdfs, native >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > While doing some performance profiling of an application using libhdfs, we > noticed a high amount of lock contention on the {{hdfsHashMutex}} defined in > {{hadoop-hdfs-native-client/src/main/native/libhdfs/os/mutexes.h}} > The issue is that every JNI method invocation done by {{hdfs.c}} goes through > a helper method called {{invokeMethod}}. {{invokeMethod}} calls > {{globalClassReference}} which acquires {{hdfsHashMutex}} while performing a > lookup in a {{htable}} (a custom hash table that lives in {{libhdfs/common}}) > (the lock is acquired for both reads and writes). The hash table maps {{char > *className}} to {{jclass}} objects, it seems the goal of the hash table is to > avoid repeatedly creating {{jclass}} objects for each JNI call. > For multi-threaded applications, this lock severely limits that rate at which > Java methods can be invoked. pstacks show a lot of time being spent on > {{hdfsHashMutex}} > {code:java} > #0 0x7fba2dbc242d in __lll_lock_wait () from /lib64/libpthread.so.0 > #1 0x7fba2dbbddcb in _L_lock_812 () from /lib64/libpthread.so.0 > #2 0x7fba2dbbdc98 in pthread_mutex_lock () from /lib64/libpthread.so.0 > #3 0x027d8386 in mutexLock () > #4 0x027d0e7b in globalClassReference () > #5 0x027d1160 in invokeMethod () > #6 0x027d4176 in readDirect () > #7 0x027d4325 in hdfsRead () > {code} > Same with {{perf report}} > {code:java} > + 63.36% 0.01% [k] system_call_fastpath > + 61.60% 0.12% [k] sys_futex > + 61.45% 0.13% [k] do_futex > + 57.54% 0.49% [k] _raw_qspin_lock > + 57.07% 0.01% [k] queued_spin_lock_slowpath > + 55.47%55.47% [k] native_queued_spin_lock_slowpath > - 35.68% 0.00% [k] 0x6f6f6461682f6568 >- 0x6f6f6461682f6568 > - 30.55% __lll_lock_wait > - 29.40% system_call_fastpath > - 29.39% sys_futex >- 29.35% do_futex > - 29.27% futex_wait > - 28.17% futex_wait_setup > - 27.05% _raw_qspin_lock >- 27.05% queued_spin_lock_slowpath > 26.30% native_queued_spin_lock_slowpath > + 0.67% ret_from_intr > + 0.71% futex_wait_queue_me > - 2.00% methodIdFromClass > - 1.94% jni_GetMethodID > - 1.71% get_method_id > 0.96% SymbolTable::lookup_only > - 1.61% invokeMethod > - 0.62% jni_CallLongMethodV > 0.52% jni_invoke_nonstatic > 0.75% pthread_mutex_lock > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14304) High lock contention on hdfsHashMutex in libhdfs
[ https://issues.apache.org/jira/browse/HDFS-14304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HDFS-14304: Status: Patch Available (was: Open) > High lock contention on hdfsHashMutex in libhdfs > > > Key: HDFS-14304 > URL: https://issues.apache.org/jira/browse/HDFS-14304 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, libhdfs, native >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > While doing some performance profiling of an application using libhdfs, we > noticed a high amount of lock contention on the {{hdfsHashMutex}} defined in > {{hadoop-hdfs-native-client/src/main/native/libhdfs/os/mutexes.h}} > The issue is that every JNI method invocation done by {{hdfs.c}} goes through > a helper method called {{invokeMethod}}. {{invokeMethod}} calls > {{globalClassReference}} which acquires {{hdfsHashMutex}} while performing a > lookup in a {{htable}} (a custom hash table that lives in {{libhdfs/common}}) > (the lock is acquired for both reads and writes). The hash table maps {{char > *className}} to {{jclass}} objects, it seems the goal of the hash table is to > avoid repeatedly creating {{jclass}} objects for each JNI call. > For multi-threaded applications, this lock severely limits that rate at which > Java methods can be invoked. pstacks show a lot of time being spent on > {{hdfsHashMutex}} > {code:java} > #0 0x7fba2dbc242d in __lll_lock_wait () from /lib64/libpthread.so.0 > #1 0x7fba2dbbddcb in _L_lock_812 () from /lib64/libpthread.so.0 > #2 0x7fba2dbbdc98 in pthread_mutex_lock () from /lib64/libpthread.so.0 > #3 0x027d8386 in mutexLock () > #4 0x027d0e7b in globalClassReference () > #5 0x027d1160 in invokeMethod () > #6 0x027d4176 in readDirect () > #7 0x027d4325 in hdfsRead () > {code} > Same with {{perf report}} > {code:java} > + 63.36% 0.01% [k] system_call_fastpath > + 61.60% 0.12% [k] sys_futex > + 61.45% 0.13% [k] do_futex > + 57.54% 0.49% [k] _raw_qspin_lock > + 57.07% 0.01% [k] queued_spin_lock_slowpath > + 55.47%55.47% [k] native_queued_spin_lock_slowpath > - 35.68% 0.00% [k] 0x6f6f6461682f6568 >- 0x6f6f6461682f6568 > - 30.55% __lll_lock_wait > - 29.40% system_call_fastpath > - 29.39% sys_futex >- 29.35% do_futex > - 29.27% futex_wait > - 28.17% futex_wait_setup > - 27.05% _raw_qspin_lock >- 27.05% queued_spin_lock_slowpath > 26.30% native_queued_spin_lock_slowpath > + 0.67% ret_from_intr > + 0.71% futex_wait_queue_me > - 2.00% methodIdFromClass > - 1.94% jni_GetMethodID > - 1.71% get_method_id > 0.96% SymbolTable::lookup_only > - 1.61% invokeMethod > - 0.62% jni_CallLongMethodV > 0.52% jni_invoke_nonstatic > 0.75% pthread_mutex_lock > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14348) Fix JNI exception handling issues in libhdfs
Sahil Takiar created HDFS-14348: --- Summary: Fix JNI exception handling issues in libhdfs Key: HDFS-14348 URL: https://issues.apache.org/jira/browse/HDFS-14348 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client, libhdfs, native Reporter: Sahil Takiar Assignee: Sahil Takiar During some manual digging through the libhdfs code, we found several places where we are not handling exceptions properly. Specifically, there seem to be some violation of the following snippet from the JNI Oracle docs (https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/design.html#exceptions_and_error_codes): {quote} *Exceptions and Error Codes* Certain JNI functions use the Java exception mechanism to report error conditions. In most cases, JNI functions report error conditions by returning an error code and throwing a Java exception. The error code is usually a special return value (such as NULL) that is outside of the range of normal return values. Therefore, the programmer can quickly check the return value of the last JNI call to determine if an error has occurred, and call a function, ExceptionOccurred(), to obtain the exception object that contains a more detailed description of the error condition. There are two cases where the programmer needs to check for exceptions without being able to first check an error code: [1] The JNI functions that invoke a Java method return the result of the Java method. The programmer must call ExceptionOccurred() to check for possible exceptions that occurred during the execution of the Java method. [2] Some of the JNI array access functions do not return an error code, but may throw an ArrayIndexOutOfBoundsException or ArrayStoreException. In all other cases, a non-error return value guarantees that no exceptions have been thrown. {quote} Here is a running list of issues: * {{classNameOfObject}} in {{jni_helper.c}} calls {{CallObjectMethod}} but does not check if an exception has occurred, it only checks if the result of the method (in this case {{Class#getName(String)}}) returns {{NULL}} * Exception handling in {{get_current_thread_id}} (both {{posix/thread_local_storage.c}} and {{windows/thread_local_storage.c}}) seems to have several issues; lots of JNI methods are called without checking for exceptions * Most of the calls to {{GetObjectArrayElement}} and {{GetByteArrayRegion}} in {{hdfs.c}} do not check for exceptions properly ** e.g. for {{GetObjectArrayElement}} they only check if the result of the operation is {{NULL}}, but they should call {{ExceptionOccurred}} to look for pending exceptions as well -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-3246) pRead equivalent for direct read path
[ https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16788246#comment-16788246 ] Sahil Takiar commented on HDFS-3246: Re-based the patch, which required some updates now that HDFS-14111 has been merged. Re-basing on top of HDFS-14111 required added {{StreamCapabilities}} support, which from the discussion in HBASE-22005, seems like something we wanted to add anyway. Fixed a few checkstyle issues and added a few more code comments. > pRead equivalent for direct read path > - > > Key: HDFS-3246 > URL: https://issues.apache.org/jira/browse/HDFS-3246 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, performance >Affects Versions: 3.0.0-alpha1 >Reporter: Henry Robinson >Assignee: Sahil Takiar >Priority: Major > Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, > HDFS-3246.003.patch, HDFS-3246.004.patch, HDFS-3246.005.patch, > HDFS-3246.006.patch, HDFS-3246.007.patch > > > There is no pread equivalent in ByteBufferReadable. We should consider adding > one. It would be relatively easy to implement for the distributed case > (certainly compared to HDFS-2834), since DFSInputStream does most of the > heavy lifting. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-3246) pRead equivalent for direct read path
[ https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HDFS-3246: --- Status: Patch Available (was: Open) > pRead equivalent for direct read path > - > > Key: HDFS-3246 > URL: https://issues.apache.org/jira/browse/HDFS-3246 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, performance >Affects Versions: 3.0.0-alpha1 >Reporter: Henry Robinson >Assignee: Sahil Takiar >Priority: Major > Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, > HDFS-3246.003.patch, HDFS-3246.004.patch, HDFS-3246.005.patch, > HDFS-3246.006.patch, HDFS-3246.007.patch > > > There is no pread equivalent in ByteBufferReadable. We should consider adding > one. It would be relatively easy to implement for the distributed case > (certainly compared to HDFS-2834), since DFSInputStream does most of the > heavy lifting. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-3246) pRead equivalent for direct read path
[ https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HDFS-3246: --- Attachment: HDFS-3246.007.patch > pRead equivalent for direct read path > - > > Key: HDFS-3246 > URL: https://issues.apache.org/jira/browse/HDFS-3246 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, performance >Affects Versions: 3.0.0-alpha1 >Reporter: Henry Robinson >Assignee: Sahil Takiar >Priority: Major > Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, > HDFS-3246.003.patch, HDFS-3246.004.patch, HDFS-3246.005.patch, > HDFS-3246.006.patch, HDFS-3246.007.patch > > > There is no pread equivalent in ByteBufferReadable. We should consider adding > one. It would be relatively easy to implement for the distributed case > (certainly compared to HDFS-2834), since DFSInputStream does most of the > heavy lifting. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-3246) pRead equivalent for direct read path
[ https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HDFS-3246: --- Status: Open (was: Patch Available) > pRead equivalent for direct read path > - > > Key: HDFS-3246 > URL: https://issues.apache.org/jira/browse/HDFS-3246 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, performance >Affects Versions: 3.0.0-alpha1 >Reporter: Henry Robinson >Assignee: Sahil Takiar >Priority: Major > Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, > HDFS-3246.003.patch, HDFS-3246.004.patch, HDFS-3246.005.patch, > HDFS-3246.006.patch, HDFS-3246.007.patch > > > There is no pread equivalent in ByteBufferReadable. We should consider adding > one. It would be relatively easy to implement for the distributed case > (certainly compared to HDFS-2834), since DFSInputStream does most of the > heavy lifting. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14111) hdfsOpenFile on HDFS causes unnecessary IO from file offset 0
[ https://issues.apache.org/jira/browse/HDFS-14111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786918#comment-16786918 ] Sahil Takiar commented on HDFS-14111: - Thank you [~jojochuang]! > hdfsOpenFile on HDFS causes unnecessary IO from file offset 0 > - > > Key: HDFS-14111 > URL: https://issues.apache.org/jira/browse/HDFS-14111 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client, libhdfs >Affects Versions: 3.2.0 >Reporter: Todd Lipcon >Assignee: Sahil Takiar >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14111.001.patch, HDFS-14111.002.patch, > HDFS-14111.003.patch > > > hdfsOpenFile() calls readDirect() with a 0-length argument in order to check > whether the underlying stream supports bytebuffer reads. With DFSInputStream, > the read(0) isn't short circuited, and results in the DFSClient opening a > block reader. In the case of a remote block, the block reader will actually > issue a read of the whole block, causing the datanode to perform unnecessary > IO and network transfers in order to fill up the client's TCP buffers. This > causes performance degradation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-3246) pRead equivalent for direct read path
[ https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786917#comment-16786917 ] Sahil Takiar commented on HDFS-3246: [~openinx], I agree with Steve. Is it possible for HBase to run their tests against a mini-HDFS cluster, that way the tests can use {{DFSInputStream}} which does support the {{ByteBuffer}} interfaces. > pRead equivalent for direct read path > - > > Key: HDFS-3246 > URL: https://issues.apache.org/jira/browse/HDFS-3246 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, performance >Affects Versions: 3.0.0-alpha1 >Reporter: Henry Robinson >Assignee: Sahil Takiar >Priority: Major > Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, > HDFS-3246.003.patch, HDFS-3246.004.patch, HDFS-3246.005.patch, > HDFS-3246.006.patch > > > There is no pread equivalent in ByteBufferReadable. We should consider adding > one. It would be relatively easy to implement for the distributed case > (certainly compared to HDFS-2834), since DFSInputStream does most of the > heavy lifting. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-3246) pRead equivalent for direct read path
[ https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784761#comment-16784761 ] Sahil Takiar commented on HDFS-3246: [~jojochuang] yes, the implementation of {{decrypt}} was the trickiest part of this patch for me, mainly because I'm not familiar with how {{CryptoInputStream}} works. If there are other HDFS committers more familiar with this code, feedback is welcome. The new {{decrypt}} method is meant to be a merge of {{decrypt(long position, byte[] buffer, int offset, int length)}} and {{decrypt(ByteBuffer buf, int n, int start)}}. I made sure to add plenty of unit tests to make sure the changes in {{CryptoInputStream}} work properly. As for your specific comments: {quote}buf.position(start + len); --> not needed? {quote} The call to {{inBuffer.put(buf)}} on line 4 increments the position of {{buf}} which is why it is necessary to reset the position. {quote}buf.limit(limit); --> not needed? {quote} The limit of the buffer is changed from its original limit on line 3; this call just resets the limit of the {{buf}} back to its original value, which is necessary because the method is not suppose to change the limit of the buffer. {quote}len += outBuffer.remaining(); --> len += Math.min(n - len, inBuffer.remaining())? {quote} The next line calls {{buf.put(outBuffer)}} which will add all remaining bytes in {{outBuffer}} to {{buf}} (up until the limit of {{buf}} is reached). So adding {{outBuffer.remaining()}} should be fine. Technically, its not an accurate representation of how much data has been decrypted because the {{buf}} limit can be reached before all {{outBuffer}} bytes are transferred, but given how the method is structured it seems to be fine. The logic here is similar to what {{decrypt(ByteBuffer buf, int n, int start)}} does. Essentially, this method decrypts {{buf}} chunk by chunk. It loads a chunk of the {{buf}} into a {{localInBuffer}} (which is borrowed from a buffer pool) and writes the decrypted data into a {{localOutBuffer}} (also borrowed from a buffer pool). Then it overwrites the chunk of {{buf}} that was originally loaded into the {{localInBuffer}}. Overall, I agree the way this method is structured is a bit confusing, but its inline with how the other decrypt methods work, which is why I wrote it this way. I can add some more code comments if that will help as well. > pRead equivalent for direct read path > - > > Key: HDFS-3246 > URL: https://issues.apache.org/jira/browse/HDFS-3246 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, performance >Affects Versions: 3.0.0-alpha1 >Reporter: Henry Robinson >Assignee: Sahil Takiar >Priority: Major > Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, > HDFS-3246.003.patch, HDFS-3246.004.patch, HDFS-3246.005.patch, > HDFS-3246.006.patch > > > There is no pread equivalent in ByteBufferReadable. We should consider adding > one. It would be relatively easy to implement for the distributed case > (certainly compared to HDFS-2834), since DFSInputStream does most of the > heavy lifting. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14111) hdfsOpenFile on HDFS causes unnecessary IO from file offset 0
[ https://issues.apache.org/jira/browse/HDFS-14111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783841#comment-16783841 ] Sahil Takiar commented on HDFS-14111: - I think this patch should be good to merge. Talked to a few other folks about {{errno}} handling, and given that the C docs say {{The value in errno is significant only when the return value of the call indicated an error (i.e., -1 from most system calls; -1 or NULL from most library functions); a function that succeeds is allowed to change errno.}} I think the test changes made as part of this patch should be fine. > hdfsOpenFile on HDFS causes unnecessary IO from file offset 0 > - > > Key: HDFS-14111 > URL: https://issues.apache.org/jira/browse/HDFS-14111 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client, libhdfs >Affects Versions: 3.2.0 >Reporter: Todd Lipcon >Assignee: Sahil Takiar >Priority: Major > Attachments: HDFS-14111.001.patch, HDFS-14111.002.patch, > HDFS-14111.003.patch > > > hdfsOpenFile() calls readDirect() with a 0-length argument in order to check > whether the underlying stream supports bytebuffer reads. With DFSInputStream, > the read(0) isn't short circuited, and results in the DFSClient opening a > block reader. In the case of a remote block, the block reader will actually > issue a read of the whole block, causing the datanode to perform unnecessary > IO and network transfers in order to fill up the client's TCP buffers. This > causes performance degradation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-3246) pRead equivalent for direct read path
[ https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HDFS-3246: --- Status: Patch Available (was: Open) > pRead equivalent for direct read path > - > > Key: HDFS-3246 > URL: https://issues.apache.org/jira/browse/HDFS-3246 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, performance >Affects Versions: 3.0.0-alpha1 >Reporter: Henry Robinson >Assignee: Sahil Takiar >Priority: Major > Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, > HDFS-3246.003.patch, HDFS-3246.004.patch, HDFS-3246.005.patch, > HDFS-3246.006.patch > > > There is no pread equivalent in ByteBufferReadable. We should consider adding > one. It would be relatively easy to implement for the distributed case > (certainly compared to HDFS-2834), since DFSInputStream does most of the > heavy lifting. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-3246) pRead equivalent for direct read path
[ https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HDFS-3246: --- Status: Open (was: Patch Available) > pRead equivalent for direct read path > - > > Key: HDFS-3246 > URL: https://issues.apache.org/jira/browse/HDFS-3246 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, performance >Affects Versions: 3.0.0-alpha1 >Reporter: Henry Robinson >Assignee: Sahil Takiar >Priority: Major > Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, > HDFS-3246.003.patch, HDFS-3246.004.patch, HDFS-3246.005.patch, > HDFS-3246.006.patch > > > There is no pread equivalent in ByteBufferReadable. We should consider adding > one. It would be relatively easy to implement for the distributed case > (certainly compared to HDFS-2834), since DFSInputStream does most of the > heavy lifting. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-3246) pRead equivalent for direct read path
[ https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HDFS-3246: --- Attachment: HDFS-3246.006.patch > pRead equivalent for direct read path > - > > Key: HDFS-3246 > URL: https://issues.apache.org/jira/browse/HDFS-3246 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, performance >Affects Versions: 3.0.0-alpha1 >Reporter: Henry Robinson >Assignee: Sahil Takiar >Priority: Major > Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, > HDFS-3246.003.patch, HDFS-3246.004.patch, HDFS-3246.005.patch, > HDFS-3246.006.patch > > > There is no pread equivalent in ByteBufferReadable. We should consider adding > one. It would be relatively easy to implement for the distributed case > (certainly compared to HDFS-2834), since DFSInputStream does most of the > heavy lifting. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14321) Fix -Xcheck:jni issues in libhdfs, run ctest with -Xcheck:jni enabled
[ https://issues.apache.org/jira/browse/HDFS-14321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16782214#comment-16782214 ] Sahil Takiar commented on HDFS-14321: - Digging into this some more, I think {{hadoopRzOptions::byteBufferPool}} should be a global ref. If {{hadoopRzOptionsSetByteBufferPool}} simply created a local ref to {{hadoopRzOptions::byteBufferPool}} then the reference would be lost once the porcess execution returned back to Java. By using a global ref, we ensure that {{byteBufferPool}} does not get garbage collected by the JVM. Since the {{byteBufferPool}} is expected to live across calls to {{hadoopReadZero}}, using a local ref does not make sense. This is based on my understanding of the JNI and the difference between local vs. global references: http://journals.ecs.soton.ac.uk/java/tutorial/native1.1/implementing/refs.html I'm not a JNI expert, so my understanding might be off, but this patch fixes the {{FATAL ERROR}}. The second part of this patch is to add {{-Xcheck:jni}} to {{LIBHDFS_OPTS}} when running all the libhdfs ctests. The drawback here is that adding this pollutes the logs with a bunch of warnings about exception handling (see above). The benefit is that it ensures we don't make any changes to libhdfs that would results in more fatal errors. IMO I think we can live with the extraneous logging, but open to changing this if others feel differently. > Fix -Xcheck:jni issues in libhdfs, run ctest with -Xcheck:jni enabled > - > > Key: HDFS-14321 > URL: https://issues.apache.org/jira/browse/HDFS-14321 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, libhdfs, native >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Attachments: HDFS-14321.001.patch > > > The JVM exposes an option called {{-Xcheck:jni}} which runs various checks > against JNI usage by applications. Further explanation of this JVM option can > be found in: > [https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/clopts002.html] > and > [https://www.ibm.com/support/knowledgecenter/en/SSYKE2_8.0.0/com.ibm.java.vm.80.doc/docs/jni_debug.html]. > When run with this option, the JVM will print out any warnings or errors it > encounters with the JNI. > We should run the libhdfs tests with {{-Xcheck:jni}} (can be added to > {{LIBHDFS_OPTS}}) and fix any warnings / errors. We should add this option to > our ctest runs as well to ensure no regressions are introduced to libhdfs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14321) Fix -Xcheck:jni issues in libhdfs, run ctest with -Xcheck:jni enabled
[ https://issues.apache.org/jira/browse/HDFS-14321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HDFS-14321: Attachment: HDFS-14321.001.patch > Fix -Xcheck:jni issues in libhdfs, run ctest with -Xcheck:jni enabled > - > > Key: HDFS-14321 > URL: https://issues.apache.org/jira/browse/HDFS-14321 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, libhdfs, native >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Attachments: HDFS-14321.001.patch > > > The JVM exposes an option called {{-Xcheck:jni}} which runs various checks > against JNI usage by applications. Further explanation of this JVM option can > be found in: > [https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/clopts002.html] > and > [https://www.ibm.com/support/knowledgecenter/en/SSYKE2_8.0.0/com.ibm.java.vm.80.doc/docs/jni_debug.html]. > When run with this option, the JVM will print out any warnings or errors it > encounters with the JNI. > We should run the libhdfs tests with {{-Xcheck:jni}} (can be added to > {{LIBHDFS_OPTS}}) and fix any warnings / errors. We should add this option to > our ctest runs as well to ensure no regressions are introduced to libhdfs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14321) Fix -Xcheck:jni issues in libhdfs, run ctest with -Xcheck:jni enabled
[ https://issues.apache.org/jira/browse/HDFS-14321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HDFS-14321: Status: Patch Available (was: Open) > Fix -Xcheck:jni issues in libhdfs, run ctest with -Xcheck:jni enabled > - > > Key: HDFS-14321 > URL: https://issues.apache.org/jira/browse/HDFS-14321 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, libhdfs, native >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > Attachments: HDFS-14321.001.patch > > > The JVM exposes an option called {{-Xcheck:jni}} which runs various checks > against JNI usage by applications. Further explanation of this JVM option can > be found in: > [https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/clopts002.html] > and > [https://www.ibm.com/support/knowledgecenter/en/SSYKE2_8.0.0/com.ibm.java.vm.80.doc/docs/jni_debug.html]. > When run with this option, the JVM will print out any warnings or errors it > encounters with the JNI. > We should run the libhdfs tests with {{-Xcheck:jni}} (can be added to > {{LIBHDFS_OPTS}}) and fix any warnings / errors. We should add this option to > our ctest runs as well to ensure no regressions are introduced to libhdfs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14321) Fix -Xcheck:jni issues in libhdfs, run ctest with -Xcheck:jni enabled
[ https://issues.apache.org/jira/browse/HDFS-14321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16778763#comment-16778763 ] Sahil Takiar edited comment on HDFS-14321 at 3/1/19 10:04 PM: -- When running the existing tests with {{\-Xcheck:jni}} I only see one error: {{FATAL ERROR in native method: Invalid global JNI handle passed to DeleteGlobalRef}}, which seems to be caused by {{hadoopRzOptionsFree}} calling {{DeleteGlobalRef}} on {{opts->byteBufferPool}} which is not a global ref. It's not clear to me how big an issue this is, since {{opts->byteBufferPool}} should be a local ref that is automatically deleted when the native method exits. There are a bunch of warnings of the form {{WARNING in native method: JNI call made without checking exceptions when required to from ...}} - after debugging these warnings, most of them seem to be caused by the JVM itself (e.g. internal JDK code). So they would have to fixed within the JDK itself. was (Author: stakiar): When running the existing tests with {{-Xcheck:jni}} I only see one error: {{FATAL ERROR in native method: Invalid global JNI handle passed to DeleteGlobalRef}}, which seems to be caused by {{hadoopRzOptionsFree}} calling {{DeleteGlobalRef}} on {{opts->byteBufferPool}} which is not a global ref. It's not clear to me how big an issue this is, since {{opts->byteBufferPool}} should be a local ref that is automatically deleted when the native method exits. There are a bunch of warnings of the form {{WARNING in native method: JNI call made without checking exceptions when required to from ...}} - after debugging these warnings, most of them seem to be caused by the JVM itself (e.g. internal JDK code). So they would have to fixed within the JDK itself. > Fix -Xcheck:jni issues in libhdfs, run ctest with -Xcheck:jni enabled > - > > Key: HDFS-14321 > URL: https://issues.apache.org/jira/browse/HDFS-14321 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, libhdfs, native >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > The JVM exposes an option called {{-Xcheck:jni}} which runs various checks > against JNI usage by applications. Further explanation of this JVM option can > be found in: > [https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/clopts002.html] > and > [https://www.ibm.com/support/knowledgecenter/en/SSYKE2_8.0.0/com.ibm.java.vm.80.doc/docs/jni_debug.html]. > When run with this option, the JVM will print out any warnings or errors it > encounters with the JNI. > We should run the libhdfs tests with {{-Xcheck:jni}} (can be added to > {{LIBHDFS_OPTS}}) and fix any warnings / errors. We should add this option to > our ctest runs as well to ensure no regressions are introduced to libhdfs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-3246) pRead equivalent for direct read path
[ https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HDFS-3246: --- Status: Open (was: Patch Available) > pRead equivalent for direct read path > - > > Key: HDFS-3246 > URL: https://issues.apache.org/jira/browse/HDFS-3246 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, performance >Affects Versions: 3.0.0-alpha1 >Reporter: Henry Robinson >Assignee: Sahil Takiar >Priority: Major > Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, > HDFS-3246.003.patch, HDFS-3246.004.patch, HDFS-3246.005.patch > > > There is no pread equivalent in ByteBufferReadable. We should consider adding > one. It would be relatively easy to implement for the distributed case > (certainly compared to HDFS-2834), since DFSInputStream does most of the > heavy lifting. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-3246) pRead equivalent for direct read path
[ https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HDFS-3246: --- Attachment: HDFS-3246.005.patch > pRead equivalent for direct read path > - > > Key: HDFS-3246 > URL: https://issues.apache.org/jira/browse/HDFS-3246 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, performance >Affects Versions: 3.0.0-alpha1 >Reporter: Henry Robinson >Assignee: Sahil Takiar >Priority: Major > Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, > HDFS-3246.003.patch, HDFS-3246.004.patch, HDFS-3246.005.patch > > > There is no pread equivalent in ByteBufferReadable. We should consider adding > one. It would be relatively easy to implement for the distributed case > (certainly compared to HDFS-2834), since DFSInputStream does most of the > heavy lifting. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-3246) pRead equivalent for direct read path
[ https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HDFS-3246: --- Status: Patch Available (was: Open) > pRead equivalent for direct read path > - > > Key: HDFS-3246 > URL: https://issues.apache.org/jira/browse/HDFS-3246 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, performance >Affects Versions: 3.0.0-alpha1 >Reporter: Henry Robinson >Assignee: Sahil Takiar >Priority: Major > Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, > HDFS-3246.003.patch, HDFS-3246.004.patch, HDFS-3246.005.patch > > > There is no pread equivalent in ByteBufferReadable. We should consider adding > one. It would be relatively easy to implement for the distributed case > (certainly compared to HDFS-2834), since DFSInputStream does most of the > heavy lifting. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-3246) pRead equivalent for direct read path
[ https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16782101#comment-16782101 ] Sahil Takiar commented on HDFS-3246: [~anoop.hbase] {quote} When calling this API with buf remaining size of n and the file is having data size > n after given position, is it guaranteed to read the whole n bytes into BB in one go? Just wanted to confirm. Thanks. {quote} Unfortunately, the existing APIs aren't clear on this behavior. The {{ByteBufferPositionedReadable}} interface is meant to follow the same semantics as {{PositionedReadable}} and {{ByteBufferReadable}}. {{PositionedReadable}} says it "Read[s] up to the specified number of bytes" and {{ByteBufferReadable}} says it "Reads up to buf.remaining() bytes". In practice, it looks like pread in {{DFSInputStream}} follows the behavior you have described, e.g. it either reads until {{ByteBuffer#hasRemaining()}} returns false, or there are no more bytes in the file. {{ByteBufferPositionedReadable}} should follow the same behavior for {{DFSInputStream}}. [~jojochuang] thanks for the review comments. I've got this implemented for {{CryptoInputStream}} as well and will post a patch soon. As far as testing goes. I've tested the libhdfs path via Impala on a real cluster and everything seemed to be working as expected (have not tested against an encrypted HDFS cluster). Will post an updated patch shortly. > pRead equivalent for direct read path > - > > Key: HDFS-3246 > URL: https://issues.apache.org/jira/browse/HDFS-3246 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, performance >Affects Versions: 3.0.0-alpha1 >Reporter: Henry Robinson >Assignee: Sahil Takiar >Priority: Major > Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, > HDFS-3246.003.patch, HDFS-3246.004.patch > > > There is no pread equivalent in ByteBufferReadable. We should consider adding > one. It would be relatively easy to implement for the distributed case > (certainly compared to HDFS-2834), since DFSInputStream does most of the > heavy lifting. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14111) hdfsOpenFile on HDFS causes unnecessary IO from file offset 0
[ https://issues.apache.org/jira/browse/HDFS-14111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780727#comment-16780727 ] Sahil Takiar commented on HDFS-14111: - Posted updated patch. Fixed comments provided by Todd. The {{test_hdfs_ext_hdfspp_test_shim_static}} failure seems to be a genuine issue caused by this patch. The failure seems to be due to the fact that libhdfs++ and libhdfs have inconsistent handling of {{errno}}. For now, I disabled the assertion that was failing, and filed a follow up JIRA HDFS-14325. HDFS-14325 has more details, once it is fixed, we can add the assertion back in. > hdfsOpenFile on HDFS causes unnecessary IO from file offset 0 > - > > Key: HDFS-14111 > URL: https://issues.apache.org/jira/browse/HDFS-14111 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client, libhdfs >Affects Versions: 3.2.0 >Reporter: Todd Lipcon >Assignee: Sahil Takiar >Priority: Major > Attachments: HDFS-14111.001.patch, HDFS-14111.002.patch, > HDFS-14111.003.patch > > > hdfsOpenFile() calls readDirect() with a 0-length argument in order to check > whether the underlying stream supports bytebuffer reads. With DFSInputStream, > the read(0) isn't short circuited, and results in the DFSClient opening a > block reader. In the case of a remote block, the block reader will actually > issue a read of the whole block, causing the datanode to perform unnecessary > IO and network transfers in order to fill up the client's TCP buffers. This > causes performance degradation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14111) hdfsOpenFile on HDFS causes unnecessary IO from file offset 0
[ https://issues.apache.org/jira/browse/HDFS-14111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HDFS-14111: Status: Patch Available (was: Open) > hdfsOpenFile on HDFS causes unnecessary IO from file offset 0 > - > > Key: HDFS-14111 > URL: https://issues.apache.org/jira/browse/HDFS-14111 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client, libhdfs >Affects Versions: 3.2.0 >Reporter: Todd Lipcon >Assignee: Sahil Takiar >Priority: Major > Attachments: HDFS-14111.001.patch, HDFS-14111.002.patch, > HDFS-14111.003.patch > > > hdfsOpenFile() calls readDirect() with a 0-length argument in order to check > whether the underlying stream supports bytebuffer reads. With DFSInputStream, > the read(0) isn't short circuited, and results in the DFSClient opening a > block reader. In the case of a remote block, the block reader will actually > issue a read of the whole block, causing the datanode to perform unnecessary > IO and network transfers in order to fill up the client's TCP buffers. This > causes performance degradation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14111) hdfsOpenFile on HDFS causes unnecessary IO from file offset 0
[ https://issues.apache.org/jira/browse/HDFS-14111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HDFS-14111: Status: Open (was: Patch Available) > hdfsOpenFile on HDFS causes unnecessary IO from file offset 0 > - > > Key: HDFS-14111 > URL: https://issues.apache.org/jira/browse/HDFS-14111 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client, libhdfs >Affects Versions: 3.2.0 >Reporter: Todd Lipcon >Assignee: Sahil Takiar >Priority: Major > Attachments: HDFS-14111.001.patch, HDFS-14111.002.patch, > HDFS-14111.003.patch > > > hdfsOpenFile() calls readDirect() with a 0-length argument in order to check > whether the underlying stream supports bytebuffer reads. With DFSInputStream, > the read(0) isn't short circuited, and results in the DFSClient opening a > block reader. In the case of a remote block, the block reader will actually > issue a read of the whole block, causing the datanode to perform unnecessary > IO and network transfers in order to fill up the client's TCP buffers. This > causes performance degradation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14111) hdfsOpenFile on HDFS causes unnecessary IO from file offset 0
[ https://issues.apache.org/jira/browse/HDFS-14111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HDFS-14111: Attachment: HDFS-14111.003.patch > hdfsOpenFile on HDFS causes unnecessary IO from file offset 0 > - > > Key: HDFS-14111 > URL: https://issues.apache.org/jira/browse/HDFS-14111 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client, libhdfs >Affects Versions: 3.2.0 >Reporter: Todd Lipcon >Assignee: Sahil Takiar >Priority: Major > Attachments: HDFS-14111.001.patch, HDFS-14111.002.patch, > HDFS-14111.003.patch > > > hdfsOpenFile() calls readDirect() with a 0-length argument in order to check > whether the underlying stream supports bytebuffer reads. With DFSInputStream, > the read(0) isn't short circuited, and results in the DFSClient opening a > block reader. In the case of a remote block, the block reader will actually > issue a read of the whole block, causing the datanode to perform unnecessary > IO and network transfers in order to fill up the client's TCP buffers. This > causes performance degradation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14325) Revise usage of errno in libhdfs
[ https://issues.apache.org/jira/browse/HDFS-14325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780677#comment-16780677 ] Sahil Takiar edited comment on HDFS-14325 at 2/28/19 4:05 PM: -- [~bobhansen], [~anatoli.shein], [~James Clampffer] wondering if you have any comments on this since it involves libhdfs++ The TL;DR is that libhdfs and libhdfs++ have inconsistent handling of setting {{errno}} on success, which causes issues for some of the shim-based libhdfs++ tests. was (Author: stakiar): [~bobhansen], [~anatoli.shein], [~James Clampffer] wondering if you have any comments on this since it involves libhdfs++. The TL;DR is that libhdfs and libhdfs++ have inconsistent handling of setting {{errno}} on success, which causes issues for some of the shim-based libhdfs++ tests. > Revise usage of errno in libhdfs > > > Key: HDFS-14325 > URL: https://issues.apache.org/jira/browse/HDFS-14325 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, libhdfs, native >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > The usage of [errno|http://man7.org/linux/man-pages/man3/errno.3.html] in > libhdfs has gone through several changes in the past: HDFS-3675, HDFS-4997, > HDFS-3579, HDFS-8407, etc. > As a result of these changes, some libhdfs functions set {{errno}} to 0 on > success ({{hadoopReadZero}}, {{hdfsListDirectory}}), while several set > {{errno}} to a meaningful value only on error. > libhdfs++ on the other hand sets {{errno}} to 0 for all (successful) > libhdfs++ operations. See > [this|https://issues.apache.org/jira/browse/HDFS-10511?focusedCommentId=15322696=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15322696] > comment in HDFS-10511 for why that was done. > The inconsistent behavior between libhdfs and libhdfs++ causes issues for > tests such as {{test_hdfs_ext_hdfspp_test_shim_static}} which uses a shim > layer ({{tests/hdfs_shim.c}}) that delegates to both {{hdfs.c}} and > {{hdfs.cc}} for various operations (e.g. opening / closing files uses both > APIs, {{hdfsWrite}} delegates to libhdfs since libhdfs++ does not support > writes yet). The tests expect {{errno}} to be set to 0 after successful > operations against the shim layer. Since libhdfs is not guaranteed to set > {{errno}} to 0 on success, tests can start failing. > One example of the inconsistency causing issues is HDFS-14111, the patch for > HDFS-14111 happens to change the {{errno}} from 0 to 2 for {{hdfsCloseFile}}. > However, from libhdfs's perspective this seems to be by design. Quoting from > the {{errno}} C docs: > {quote}The value in errno is significant only when the return value of the > call indicated an error (i.e., -1 from most system calls; -1 or NULL from > most library functions); a function that succeeds is allowed to change errno. > {quote} > I was not able to pin down why the patch for HDFS-14111 changed the {{errno}} > value, but I isolated the change to the {{FileSystem#close}} call. Most > likely, some C function invoked as a result of calling {{#close}} failed and > changed the {{errno}} value, but the {{#close}} was still able to succeed > (this is most likely expected behavior, see > [this|https://issues.apache.org/jira/browse/HDFS-8407?focusedCommentId=14551225=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14551225] > comment in HDFS-8407 for further validation). > Going forward we could (1) set {{errno}} to 0 for all successful libhdfs > functions, which would make the libhdfs behavior consistent with the > libhdfs++ behavior, or (2) we could live with the discrepancy, which would > require modifying from the libhdfs++ shim tests (which assert that {{errno}} > is 0 after certain operations) and just document the difference in behavior. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14325) Revise usage of errno in libhdfs
[ https://issues.apache.org/jira/browse/HDFS-14325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780677#comment-16780677 ] Sahil Takiar commented on HDFS-14325: - [~bobhansen], [~anatoli.shein], [~James Clampffer] wondering if you have any comments on this since it involves libhdfs++. The TL;DR is that libhdfs and libhdfs++ have inconsistent handling of setting {{errno}} on success, which causes issues for some of the shim-based libhdfs++ tests. > Revise usage of errno in libhdfs > > > Key: HDFS-14325 > URL: https://issues.apache.org/jira/browse/HDFS-14325 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, libhdfs, native >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > The usage of [errno|http://man7.org/linux/man-pages/man3/errno.3.html] in > libhdfs has gone through several changes in the past: HDFS-3675, HDFS-4997, > HDFS-3579, HDFS-8407, etc. > As a result of these changes, some libhdfs functions set {{errno}} to 0 on > success ({{hadoopReadZero}}, {{hdfsListDirectory}}), while several set > {{errno}} to a meaningful value only on error. > libhdfs++ on the other hand sets {{errno}} to 0 for all (successful) > libhdfs++ operations. See > [this|https://issues.apache.org/jira/browse/HDFS-10511?focusedCommentId=15322696=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15322696] > comment in HDFS-10511 for why that was done. > The inconsistent behavior between libhdfs and libhdfs++ causes issues for > tests such as {{test_hdfs_ext_hdfspp_test_shim_static}} which uses a shim > layer ({{tests/hdfs_shim.c}}) that delegates to both {{hdfs.c}} and > {{hdfs.cc}} for various operations (e.g. opening / closing files uses both > APIs, {{hdfsWrite}} delegates to libhdfs since libhdfs++ does not support > writes yet). The tests expect {{errno}} to be set to 0 after successful > operations against the shim layer. Since libhdfs is not guaranteed to set > {{errno}} to 0 on success, tests can start failing. > One example of the inconsistency causing issues is HDFS-14111, the patch for > HDFS-14111 happens to change the {{errno}} from 0 to 2 for {{hdfsCloseFile}}. > However, from libhdfs's perspective this seems to be by design. Quoting from > the {{errno}} C docs: > {quote}The value in errno is significant only when the return value of the > call indicated an error (i.e., -1 from most system calls; -1 or NULL from > most library functions); a function that succeeds is allowed to change errno. > {quote} > I was not able to pin down why the patch for HDFS-14111 changed the {{errno}} > value, but I isolated the change to the {{FileSystem#close}} call. Most > likely, some C function invoked as a result of calling {{#close}} failed and > changed the {{errno}} value, but the {{#close}} was still able to succeed > (this is most likely expected behavior, see > [this|https://issues.apache.org/jira/browse/HDFS-8407?focusedCommentId=14551225=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14551225] > comment in HDFS-8407 for further validation). > Going forward we could (1) set {{errno}} to 0 for all successful libhdfs > functions, which would make the libhdfs behavior consistent with the > libhdfs++ behavior, or (2) we could live with the discrepancy, which would > require modifying from the libhdfs++ shim tests (which assert that {{errno}} > is 0 after certain operations) and just document the difference in behavior. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14325) Revise usage of errno in libhdfs
[ https://issues.apache.org/jira/browse/HDFS-14325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HDFS-14325: Description: The usage of [errno|http://man7.org/linux/man-pages/man3/errno.3.html] in libhdfs has gone through several changes in the past: HDFS-3675, HDFS-4997, HDFS-3579, HDFS-8407, etc. As a result of these changes, some libhdfs functions set {{errno}} to 0 on success ({{hadoopReadZero}}, {{hdfsListDirectory}}), while several set {{errno}} to a meaningful value only on error. libhdfs++ on the other hand sets {{errno}} to 0 for all (successful) libhdfs++ operations. See [this|https://issues.apache.org/jira/browse/HDFS-10511?focusedCommentId=15322696=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15322696] comment in HDFS-10511 for why that was done. The inconsistent behavior between libhdfs and libhdfs++ causes issues for tests such as {{test_hdfs_ext_hdfspp_test_shim_static}} which uses a shim layer ({{tests/hdfs_shim.c}}) that delegates to both {{hdfs.c}} and {{hdfs.cc}} for various operations (e.g. opening / closing files uses both APIs, {{hdfsWrite}} delegates to libhdfs since libhdfs++ does not support writes yet). The tests expect {{errno}} to be set to 0 after successful operations against the shim layer. Since libhdfs is not guaranteed to set {{errno}} to 0 on success, tests can start failing. One example of the inconsistency causing issues is HDFS-14111, the patch for HDFS-14111 happens to change the {{errno}} from 0 to 2 for {{hdfsCloseFile}}. However, from libhdfs's perspective this seems to be by design. Quoting from the {{errno}} C docs: {quote}The value in errno is significant only when the return value of the call indicated an error (i.e., -1 from most system calls; -1 or NULL from most library functions); a function that succeeds is allowed to change errno. {quote} I was not able to pin down why the patch for HDFS-14111 changed the {{errno}} value, but I isolated the change to the {{FileSystem#close}} call. Most likely, some C function invoked as a result of calling {{#close}} failed and changed the {{errno}} value, but the {{#close}} was still able to succeed (this is most likely expected behavior, see [this|https://issues.apache.org/jira/browse/HDFS-8407?focusedCommentId=14551225=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14551225] comment in HDFS-8407 for further validation). Going forward we could (1) set {{errno}} to 0 for all successful libhdfs functions, which would make the libhdfs behavior consistent with the libhdfs++ behavior, or (2) we could live with the discrepancy, which would require modifying from the libhdfs++ shim tests (which assert that {{errno}} is 0 after certain operations) and just document the difference in behavior. was: The usage of [errno|http://man7.org/linux/man-pages/man3/errno.3.html] in libhdfs has gone through several changes in the past: HDFS-3675, HDFS-4997, HDFS-3579, HDFS-8407, etc. As a result of these changes, some libhdfs functions set {{errno}} to 0 on success ({{hadoopReadZero}}, {{hdfsListDirectory}}), while several set {{errno}} to a meaningful value only on error. libhdfs++ on the other hand sets {{errno}} to 0 for all (successful) libhdfs++ operations. See this comment in HDFS-10511 for why that was done. The inconsistent behavior between libhdfs and libhdfs++ causes issues for tests such as {{test_hdfs_ext_hdfspp_test_shim_static}} which uses a shim layer ({{tests/hdfs_shim.c}}) that delegates to both {{hdfs.c}} and {{hdfs.cc}} for various operations (e.g. opening / closing files uses both APIs, {{hdfsWrite}} delegates to libhdfs since libhdfs++ does not support writes yet). The tests expect {{errno}} to be set to 0 after successful operations against the shim layer. Since libhdfs is not guaranteed to set {{errno}} to 0 on success, tests can start failing. One example of the inconsistency causing issues is HDFS-14111, the patch for HDFS-14111 happens to change the {{errno}} from 0 to 2 for {{hdfsCloseFile}}. However, from libhdfs's perspective this seems to be by design. Quoting from the {{errno}} C docs: {quote}The value in errno is significant only when the return value of the call indicated an error (i.e., -1 from most system calls; -1 or NULL from most library functions); a function that succeeds is allowed to change errno. {quote} I was not able to pin down why the patch for HDFS-14111 changed the {{errno}} value, but I isolated the change to the {{FileSystem#close}} call. Most likely, some C function invoked as a result of calling {{#close}} failed and changed the {{errno}} value, but the {{#close}} was still able to succeed (this is most likely expected behavior, see this comment in HDFS-8407 for further validation). Going forward we could (1) set {{errno}} to 0 for all successful libhdfs functions, which would make
[jira] [Created] (HDFS-14325) Revise usage of errno in libhdfs
Sahil Takiar created HDFS-14325: --- Summary: Revise usage of errno in libhdfs Key: HDFS-14325 URL: https://issues.apache.org/jira/browse/HDFS-14325 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client, libhdfs, native Reporter: Sahil Takiar Assignee: Sahil Takiar The usage of [errno|http://man7.org/linux/man-pages/man3/errno.3.html] in libhdfs has gone through several changes in the past: HDFS-3675, HDFS-4997, HDFS-3579, HDFS-8407, etc. As a result of these changes, some libhdfs functions set {{errno}} to 0 on success ({{hadoopReadZero}}, {{hdfsListDirectory}}), while several set {{errno}} to a meaningful value only on error. libhdfs++ on the other hand sets {{errno}} to 0 for all (successful) libhdfs++ operations. See this comment in HDFS-10511 for why that was done. The inconsistent behavior between libhdfs and libhdfs++ causes issues for tests such as {{test_hdfs_ext_hdfspp_test_shim_static}} which uses a shim layer ({{tests/hdfs_shim.c}}) that delegates to both {{hdfs.c}} and {{hdfs.cc}} for various operations (e.g. opening / closing files uses both APIs, {{hdfsWrite}} delegates to libhdfs since libhdfs++ does not support writes yet). The tests expect {{errno}} to be set to 0 after successful operations against the shim layer. Since libhdfs is not guaranteed to set {{errno}} to 0 on success, tests can start failing. One example of the inconsistency causing issues is HDFS-14111, the patch for HDFS-14111 happens to change the {{errno}} from 0 to 2 for {{hdfsCloseFile}}. However, from libhdfs's perspective this seems to be by design. Quoting from the {{errno}} C docs: {quote}The value in errno is significant only when the return value of the call indicated an error (i.e., -1 from most system calls; -1 or NULL from most library functions); a function that succeeds is allowed to change errno. {quote} I was not able to pin down why the patch for HDFS-14111 changed the {{errno}} value, but I isolated the change to the {{FileSystem#close}} call. Most likely, some C function invoked as a result of calling {{#close}} failed and changed the {{errno}} value, but the {{#close}} was still able to succeed (this is most likely expected behavior, see this comment in HDFS-8407 for further validation). Going forward we could (1) set {{errno}} to 0 for all successful libhdfs functions, which would make the libhdfs behavior consistent with the libhdfs++ behavior, or (2) we could live with the discrepancy, which would require modifying from the libhdfs++ shim tests (which assert that {{errno}} is 0 after certain operations) and just document the difference in behavior. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14083) libhdfs logs errors when opened FS doesn't support ByteBufferReadable
[ https://issues.apache.org/jira/browse/HDFS-14083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779549#comment-16779549 ] Sahil Takiar commented on HDFS-14083: - [~yzhangal] HDFS-14111 fixes this. It removes the log line {{UnsupportedOperationException: Byte-buffer read unsupported by input}} completely. Given the approach in HDFS-14111 I don't think the log line is ever necessary. The {{StreamCapabilities}} interface in Hadoop captures whether or not a stream supports the {{readDirect}} code path. I suggest we close this JIRA in favor of HDFS-14111. The solution in this JIRA is really just masking the problem rather than fixing it. As far I understand, it decreases the frequency at which the logging occurs, whereas HDFS-14111 removes the logging completely. This has multiple benefits (1) it avoids collecting a stack trace for each file open, and (2) simply decreasing the frequency at which logging occurs still causes confusion for users, the log indicates that some error occurred, whereas in reality lack of support for {{readDirect}} is an expected limitation. > libhdfs logs errors when opened FS doesn't support ByteBufferReadable > - > > Key: HDFS-14083 > URL: https://issues.apache.org/jira/browse/HDFS-14083 > Project: Hadoop HDFS > Issue Type: Improvement > Components: libhdfs, native >Affects Versions: 3.0.3 >Reporter: Pranay Singh >Assignee: Pranay Singh >Priority: Minor > Attachments: HADOOP-15928.001.patch, HADOOP-15928.002.patch, > HDFS-14083.003.patch, HDFS-14083.004.patch, HDFS-14083.005.patch, > HDFS-14083.006.patch, HDFS-14083.007.patch, HDFS-14083.008.patch, > HDFS-14083.009.patch > > > Problem: > > There is excessive error logging when a file is opened by libhdfs > (DFSClient/HDFS) in S3 environment, this issue is caused because buffered > read is not supported in S3 environment, HADOOP-14603 "S3A input stream to > support ByteBufferReadable" > The following message is printed repeatedly in the error log/ to STDERR: > {code} > -- > UnsupportedOperationException: Byte-buffer read unsupported by input > streamjava.lang.UnsupportedOperationException: Byte-buffer read unsupported > by input stream > at > org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:150) > {code} > h3. Root cause > After investigating the issue, it appears that the above exception is printed > because > when a file is opened via {{hdfsOpenFileImpl()}} calls {{readDirect()}} which > is hitting this > exception. > h3. Fix: > Since the hdfs client is not initiating the byte buffered read but is > happening in a implicit manner, we should not be generating the error log > during open of a file. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14111) hdfsOpenFile on HDFS causes unnecessary IO from file offset 0
[ https://issues.apache.org/jira/browse/HDFS-14111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779551#comment-16779551 ] Sahil Takiar commented on HDFS-14111: - Thanks for the feedback Todd, will address the issue with {{newJavaStr}}. Yes, this fixes HDFS-14083 as well. I think we can close HDFS-14083 in favor of the approach taken here (just left a comment on HDFS-14083, if no one objects, I will close that JIRA). {{test_hdfs_ext_hdfspp_test_shim_static}} seems to be consistently failing in Hadoop QA; still trying to figure out why. > hdfsOpenFile on HDFS causes unnecessary IO from file offset 0 > - > > Key: HDFS-14111 > URL: https://issues.apache.org/jira/browse/HDFS-14111 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client, libhdfs >Affects Versions: 3.2.0 >Reporter: Todd Lipcon >Assignee: Sahil Takiar >Priority: Major > Attachments: HDFS-14111.001.patch, HDFS-14111.002.patch > > > hdfsOpenFile() calls readDirect() with a 0-length argument in order to check > whether the underlying stream supports bytebuffer reads. With DFSInputStream, > the read(0) isn't short circuited, and results in the DFSClient opening a > block reader. In the case of a remote block, the block reader will actually > issue a read of the whole block, causing the datanode to perform unnecessary > IO and network transfers in order to fill up the client's TCP buffers. This > causes performance degradation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14321) Fix -Xcheck:jni issues in libhdfs, run ctest with -Xcheck:jni enabled
[ https://issues.apache.org/jira/browse/HDFS-14321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16778763#comment-16778763 ] Sahil Takiar commented on HDFS-14321: - When running the existing tests with {{-Xcheck:jni}} I only see one error: {{FATAL ERROR in native method: Invalid global JNI handle passed to DeleteGlobalRef}}, which seems to be caused by {{hadoopRzOptionsFree}} calling {{DeleteGlobalRef}} on {{opts->byteBufferPool}} which is not a global ref. It's not clear to me how big an issue this is, since {{opts->byteBufferPool}} should be a local ref that is automatically deleted when the native method exits. There are a bunch of warnings of the form {{WARNING in native method: JNI call made without checking exceptions when required to from ...}} - after debugging these warnings, most of them seem to be caused by the JVM itself (e.g. internal JDK code). So they would have to fixed within the JDK itself. > Fix -Xcheck:jni issues in libhdfs, run ctest with -Xcheck:jni enabled > - > > Key: HDFS-14321 > URL: https://issues.apache.org/jira/browse/HDFS-14321 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, libhdfs, native >Reporter: Sahil Takiar >Assignee: Sahil Takiar >Priority: Major > > The JVM exposes an option called {{-Xcheck:jni}} which runs various checks > against JNI usage by applications. Further explanation of this JVM option can > be found in: > [https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/clopts002.html] > and > [https://www.ibm.com/support/knowledgecenter/en/SSYKE2_8.0.0/com.ibm.java.vm.80.doc/docs/jni_debug.html]. > When run with this option, the JVM will print out any warnings or errors it > encounters with the JNI. > We should run the libhdfs tests with {{-Xcheck:jni}} (can be added to > {{LIBHDFS_OPTS}}) and fix any warnings / errors. We should add this option to > our ctest runs as well to ensure no regressions are introduced to libhdfs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org